New  Audio  Technology 
for  Blind  People 


editors 

J  M  Gill  and  G  Butcher 

February  1989 


Concerted  Action 

on 

TECHNOLOGY 

and 

BLINDNESS 


Medical  &  Health 
Research  Programme 
of  the 

European  Community 


■ 

1 


New  Audio  Technology 
for  Blind  People 


editors 

J  M  Gill  and  G  Butcher 

February  1989 


printed  by 

Technical  Development  Department 
Royal  National  Institute  for  the  Blind 
224  Great  Portland  Street 
London  WIN  6AA 
England 


ISBN  0  901797  413 


Concerted  Action  on  Technology  and  Blindness 

BME  Secretary 

Dr  V  Thevenin,  CEE,  DG  XII,  200  Rue  de  la  Loi,  B-1049  Bruxelles,  Belgium.  Tel:  +  32-2-235  0034;  Telex  21877  Comeu  B; 
Fax  +32-2-235  0145. 

BME  Responsible 

Dr  A  Sargentini  Daniele,  Laboratorio  Tecnologie  Biomediche,  Istituto  Superiore  Sanita,  Viale  Regina  Elena  299,  00161 
Roma,  Italy.  Tel:  +39-6-495  5763;  Telex  610071  Istisan;  Fax +39-6-495  7621. 

Dr  B  van  Eijnsbergen,  TNO  Corporate  Planning  Department,  PO  Box  297,  Juliana  van  Stolberglaan  148, 2501 BD  The  Hague, 
The  Netherlands.  Tel:  +  31-70-496442;  Telex  31660  TnogvNL;  Fax  +31-70-855700. 

PMG  Members 

Project  Leader:  Dr  P  L  Emiliani,  IROE-CNR,  Via  Panciatichi  64, 50127  Firenze,  Italy.  Tel:  +  39-55-4378512;  Telex  570231 
Iroe;  Fax  +  39-55-410893. 

Dr  J  M  Gill,  Royal  National  Institute  for  the  Blind,  224  Great  Portland  Street,  London  WIN  6AA,  England.  Tel:  +44-1- 
388  1266  ext  2371;  Fax  +  44-1-388  2034. 

Dr  P  Graziani,  IROE-CNR,  Via  Panciatichi  64, 50127  Firenze,  Italy.  Tel:  +39-55-4378512;  Telex  570231  Iroe;  Fax  +39-55- 
410893. 

Miss  J  Silver,  Moorfields  Eye  Hospital,  City  Road,  London  EC1V  2PD,  England.  Tel:  +44-1-253  3411  ext  2100;  Telex  266129 
Mehcr  G;  Fax  +  44-1-250  3207. 

Dr  M  Soede,  Institute  for  Rehabilitation  Research,  Zandbergsweg  111,  6432  CC  Hoensbroek,  The  Netherlands.  Tel:  +  31- 
45-224300;  Fax  +31-45-226370. 

Dr  J  A  Soerensen,  Electronics  Laboratory,  Technical  University  of  Denmark,  Building  344,  DK-2800  Lyngby,  Denmark. 
Tel:  +  45-2-881566;  Telex  37529  Dthdia. 

Dr  M  Truquet,  Centre  Tobia,  Universite  Paul  Sabatier,  118  Route  de  Narbonne,  F-31062  Toulouse  Cedex,  France.  Tel: 
+  33-61-556944;  Telex  512880;  Fax  +33-61-556470. 

Mr  R  F  V  Witte,  Deutsche  Blindenstudienansthlt,  Am  Schlag  8-10,  D-3550  Marburg/Lahn,  German  Federal  Republic.  Tel: 
+  49-6421-606100;  Telex  4821106;  Fax  +49-6421-606229. 


1 

Contents 


Medical  and  Health  Research  Programme  of  the  EC .  2 

Introduction .  4 

Multi-lingual  Text-to-Speech  for  the  Visually  Disabled 

Bjorn  Granstrom  .  5 

Speech  Quality  Assessment 

Herman  Steeneken .  10 

Text  to  Speech  Synthesis  System  Screen  Reader  for  Visually  Disabled  People 

Jean-Jaques  Rigoni .  15 

A  Word  Processor  for  Blind  Koreans 

Emerson  Foulke,  John  Kilpatrick  &  Yound  Woo  Kang .  18 

Human  Factors  Issues  for  the  Use  of  Speech  Technology  in  Information  Products 

Stephen  Fumer .  20 

Digital  Daily  Newspapers  for  Blind  People  in  Sweden 

Jan-Ingvar  Lindstrom  .  25 

Applications  of  Vocoder  Techniques  for  the  Blind 

Paolo  Graziani  . 28 

Telework  Projects 

Sean  Kenny .  32 

Audio  Technology  in  IT  Education  for  the  Blind 

Jill  Hewitt  .  36 

Audio  Technology  for  Blind  Programmers 

Gerry  Ellis  .  44 

Summary  and  Recommendations 

Emerson  Foulke  .  46 

Other  publications  .  51 


Medical  and  Health  Research  Programme  of  the  EC 
Biomedical  Engineering  in  the  European  Community 


The  involvement  of  the  European  Community  (EC)  in  the  field  of  Medical  and  Health  Research 
started  in  1978  with  the  first  Programme  which  contained  three  projects.  Since  then,  it  has  steadily 
expanded  and  it  will  include  about  120  projects  by  the  end  of  the  fourth  Programme  (1987-1991). 

The  general  goal  of  the  programme  is  clearly  to  contribute  to  a  better  quality  of  life  by  improving 
health,  and  its  distinctive  feature  is  to  strengthen  European  collaboration  in  order  to  achieve  this  goal. 

The  main  objectives  of  this  collaboration  are: 

1.  Increase  the  scientific  efficiency  of  the  relevant  research  and  development  efforts  in  the 
Member  States  through  their  gradual  coordination  at  Community  level  following  the 
mobilization  of  the  available  research  potential  of  national  programmes,  and  also  their 
economic  efficiency  through  sharing  of  tasks  and  strengthening  the  joint  use  of  available 
health  research  resources, 

2.  Improve  scientific  and  technical  knowledge  in  the  research  and  development  areas 
selected  for  their  importance  to  all  Member  States,  and  promote  its  efficient  transfer 
into  practical  applications,  taking  particular  account  of  potential  industrial  and 
economic  developments  in  the  areas  concerned, 

3.  Optimize  the  capacity  and  economic  efficiency  of  health  care  efforts  throughout  the 
countries  and  regions  of  the  Community. 

The  current  programme  consists  of  six  research  targets.  Four  are  related  to  major  health  problems: 
cancer,  AIDS,  age-related  problems,  and  personal  environment  and  life-style  related  problems:  two 
are  related  to  health  resources:  medical  technology  development  and  health  services  research. 

Funds  are  provided  by  the  Community  for  relevant  "concerted  action"  activities  which  consist  of 
research  collaboration  and  coordination  in  EC  Member  States  and/or  in  other  European  participant 
countries.  Networks  of  research  institutes  can  be  set  up  and  supported  by  means  of  meetings, 
workshops,  short-term  staff  exchanges/visits  to  other  countries,  information  dissemination  and  so  on: 
centralized  facilities  such  as  data  banks,  computing,  and  preparation  and  distribution  of  reference 
materials  can  also  be  funded.  The  funds  are  not  direct  research  grants;  the  institutes  concerned  must 
fund  the  research  activities  carried  out  within  their  own  countries  -  it  is  the  international  coordination 
activities  which  are  eligible  for  Community  support.  Each  such  research  network  is  placed  under  the 
responsibility  of  a  project  leader  chosen  from  among  the  leading  scientists  in  the  network,  with  the 
assistance  of  a  project  management  group  representing  the  teams  participating  in  the  network. 

The  Commission  of  the  European  Cotnmunities  is  assisted  in  the  execution  of  this  Programme  by  a 
Management  and  Coordination  Advisory  Committee  (CGC  -  Medical  and  Health  Research),  and  by 


Concerted  Action  Committees  (COMACs)  and  Working  Parties,  composed  of  representatives  and  of 
scientific  experts  respectively,  designated  by  the  competent  authorities  of  the  Member  States. 

Other  European  Countries,  not  belonging  to  the  EC  but  participating  in  COST  (Cooperation  in 
Science  and  Technology)  may  take  part  in  the  Programme. 

The  present  work  was  conducted  according  to  the  advice  of  COMAC-BME  which  supervises  the 
coordination  of  research  in  biomedical  engineering  (BME)  within  the  Medical  Technology 
Development  target. 

More  information  may  be  obtained  from: 

Commission  of  the  European  Communities 
Directorate  General  XII-F-6 
200  Rue  de  la  Loi 
B  -  1049  Brussels 

Tel:  +  32-2-235.00.34  /  Telex:  COMEU  B  21877 
Telefax:  +32-2-235.01.45;  +32-2-236.20.07 


4 


Introduction 


The  basic  goal  of  the  workshop  was  to  identify  priorities  for  future  research  on  audio  technology  for 
the  benefit  of  visually  disabled  persons.  The  workshop  was  held  in  Copthorne,  England  from  26th  to 
28th  October  1988.  The  meeting  was  chaired  by  Dr  Emerson  Foulke  and  Dr  John  Gill. 

Before  the  workshop,  participants  submitted  papers  as  a  basis  for  discussion.  Since  an  important  aspect 
of  a  workshop  is  the  discussion,  the  following  papers  were  revised  by  participants  incorporating  ideas 
discussed  during  the  workshop. 

Participants 

Mr  Gary  Berger,  PO  Box  298,  Tigerville,  South  Carolina  29688,  USA. 

Ms  Gillian  Butcher,  Royal  National  Institute  for  the  Blind,  224  Great  Portland  Street,  London  WIN 
6AA  England. 

Mr  Gerry  Ellis,  Bank  of  Ireland  Computer  Centre,  Cabinteely,  Dublin  18,  Ireland. 

Dr  Pier  Luigi  Emiliani,  IROE-CNR,  Via  Panciatichi  64,  50127  Firenze,  Italy. 

Dr  Emerson  Foulke,  Perceptual  Atematives  Laboratory,  University  of  Louisville,  Louisville,  Ken¬ 
tucky  40292,  USA 

Mr  Stephen  Fumer,  Human  Factors  Division,  British  Telecom  Research  Laboratories,  Martlesham 
Heath,  Ipswich  IP5  7RE,  England. 

Dr  John  Gill,  Royal  National  Institute  for  the  Blind,  224  Great  Portland  Street,  London  WIN  6AA 
England. 

Dr  Bjorn  Granstrom,  Speech  Communication  and  Music  Acoustics,  Royal  Institute  of  Technology,  S- 
100  44  Stockholm,  Sweden. 

Ms  Jill  Hewitt,  School  of  Information  Science,  Hatfield  Polytechnic,  College  Lane,  Hatfield,  Hertford¬ 
shire,  England. 

Mr  Sean  Kenny,  National  Rehabilitation  Board,  25  Clyde  Road,  Balls  Bridge,  Dublin  4,  Ireland. 

Dr  Jan-Ingvar  Lindstrom,  Handikappinstitutet,  PO  Box  303,  S-161  26  Bromma,  Sweden. 

Mr  Jean  Jacques  Rigoni,  Elan  Informatique,  20  rue  des  Freres  Lumiere,  ZA  Nord,  31520  Ramonville 
St  Agne,  France. 

Dr  Herman  Steeneken,  Instituit  voor  Zintuigfysiologie/TNO,  Kampweg  5,  Soesterberg,  The  Nether¬ 
lands. 

Dr  Constantine  Stephanidis,  c/o  Foundation  of  Research  and  Technology  -  Hellas,  PO  Box  1385,  GR 
71110  Iraklio,  Crete,  Greece. 


Multi-lingual  Text-to-Speech  for  the  Visually  Disabled 

Bjorn  Granstrom 

Royal  Institute  of  Technology,  Sweden 


5 


Abstract 

The  introduction  of  speech  synthesis  has  been  of  revolutionary  importance  for  many  visually  impaired 
persons  (1),  (2).  The  impact  of  the  text-to-speech  technology  can  be  compared  to  the  braille  writing 
system  for  reducing  the  information  handicap  of  the  blind.  One  important  factor  is  that  the  new 
technology  addresses  a  much  bigger  user  group  than  ever  the  braille  system.  It  must  however  be 
realized  that  writing  and  speech  are  rather  different  media.  To  apply  the  text-to-speech  technology  in 
an  optimal  way,  is  not  an  easy  task.  A  lot  of  research  and  development  is  still  needed  in  this  area. 

Special  demands  by  the  blind  on  the  speech  synthesis  system  can  be  identified.  Both  hyper-correct 
reading  and  very  high  speaking  rates  are  needed.  On  the  European  scene  the  multi-lingual  situation 
causes  several  problems.  The  individual  markets  for  most  of  the  national  languages  are  relatively  small. 
Many  individuals  need  several  languages  in  their  everyday  life  and  work.  Some  ways  out  of  this  dilemna 
will  also  be  discussed. 

The  case  for  multi-lingual  text-to-speech 

Most  of  the  population  in  Europe  get  in  touch  with  more  than  one  language.  This  is  obvious  in 
multi-lingual  societies  like  Switzerland  and  Belgium.  Most  schools  in  Europe  have  foreign  languages 
on  their  mandatory  curriculum.  With  the  opening  of  the  borders  in  Europe  more  and  more  people 
will  get  in  direct  contact  with  several  languages  on  an  almost  daily  basis.  Text-to-speech  devices, 
whether  they  are  used  professionally  or  not  by  blind  persons,  ought  to  have  a  multi-lingual  capability. 
Already  today,  Infovox,  the  only  producer  of  a  truly  multi-lingual  system,  estimates  that  about  25%  of 
their  sales  in  application  for  the  blind  are  delivered  with  more  than  one  language. 

One  system  or  many? 

A  general  problem  with  aids  for  the  handicapped  is  that  the  market  is  not  regarded  as  commercially 
interesting.  This  problem  is  accentuated  when  it  concerns  sophisticated  reading  aids  for  the  visually 
impaired,  in  particular  when  the  devices  are  mono-lingual,  and  hence  targeted  to  a  relatively  small 
national  market. 

A  striking  example  is  the  reading  machine  with  synthetic  speech.  There  has  been  a  substantial  number 
of  research  and  development  projects  aimed  at  producing  such  a  machine.  However,  the  Kurzweil 
reading  machine  is  the  only  device  that  has  any  considerable  commercial  impact.  The  recent  efforts 
to  bring  this  to  the  European  market  has  not  been  very  successful.  One  reason  is  that  neither  the 
character  recognition  part  nor  the  speech  output  part  was  designed  to  be  multi-lingual.  The  speech 
synthesizer  needed  to  be  changed  for  the  non-English  versions  and  that  involved  considerable,  costly 
software  and  hardware  modifications.  Text-to-speech  devices  differ  in  several  aspects.  The  physical 


6 


appearance,  the  interface  protocols  and  the  functional  capabilities  vary  in  the  presently  available 
systems.  In  the  reading  machine  example  the  adjustment  was  done  by  the  end-product  manufacturer. 

The  modification  can  also  be  the  responsibility  of  the  text-to-speech  producer.  One  example  is  the 
IBM  Screen  Reader,  a  screen  access  package  that  accepts  several  different  text-to-speech  systems 
ranging  from  the  high  quality,  expensive  Dectalk  to  the  cheap  and  simple  Echo  synthesizer.  The  Screen 
Reader  is  still  available  only  in  the  English  language. 

Need  of  standardization 

One  way  to  cope  with  the  variability  of  text-to-speech  devices  is  to  enforce  or  at  least  define  some  kind 
of  standard.  This  has  been  preliminary  discussed  in  a  sub-committee  of  the  International  Standards 
Organization  ISO.  The  aim  is  to  obtain  compatibility  between  the  different  speech  synthesizers  to  be 
included  in  e.g.  reading  aids.  This  compatibility  could  be  on  several  levels.  The  least  ambitious 
approach  could  be  on  the  functional  level,  making  the  command  structure  identical.  Physical 
integration  and  interfacing  will  still  be  needed  when  a  new  text-to-speech  device  should  be  used. 

Different  degrees  of  further  standardization  will  make  the  burden  on  the  end-product  manufacturer 
less  and  hopefully  increase  the  market  and  decrease  the  cost.  However,  since  most  producers  of 
text-to-speech  systems  don’t  see  the  market  for  handicapped  persons  as  their  main  market,  it  is  unlikely 
that  this  will  happen  unless  the  same  standard  could  be  adopted  for  all  markets. 

The  most  ambitious  standardization  is  total  software  compatibility.  This  would  imply  that  the 
text-to-speech  software  for  any  language  could  be  run  on  all  standardized  devices.  Interestingly  enough 
this  also  means  that  the  need  for  physical  standardization  disapears.  One  implicaton  of  this  is  that 
different  end  products  could  be  given  optimal,  integrated  designs.  This  may  seem  somewhat  utopian 
considering  the  many  different  approaches  to  speech  synthesis  presently  available,  but  offers  a  new 
alternative.  The  problem  of  standardizing  text-to-speech  is  reduced  to  developing  the  algorithms  in  a 
standardized  framework.  To  a  certain  extent  that  is  the  approach  taken  at  our  department  and 
cooperating  groups. 

The  RULSYS  approach 

The  text-to-speech  system,  originally  developed  in  our  department  and  that  is  now  commercially 
available  for  Infovox  AB,  was  originally  designed  to  be  multi-lingual.  The  language  specific  parts  are 
mostly  formulated  in  a  notation  close  to  the  one  commonly  used  in  the  linguistic  practice  referred  to 
as  "generative  phonology".  The  language  independent  parts  of  the  system  are  however  efficiently 
coded,  partially  in  assembler  language  for  the  microprocessor  and  the  signal  processor  involved.  This 
needs  ideally  to  be  done  only  once.  The  task  of  improving  the  system  for  a  particular  language  or 
adding  a  new  language  to  the  system  will  be  a  continued  effort.  This  work  is  then  done  in  a  development 
environment  that  is  easy  and  familiar  to  many  speech  and  language  researchers.  There  is  no  need  for 
conventional  skills  in  computer  programing  in  this  process.  This  means  that  the  expert  directly  can 
test  and  implement  ideas  for  the  text-to-speech  system  without  the  problem  of  communicating  them 
to  a  computer  programmer  with  the  potential  risk  of  misunderstandings  or  loss  of  information.  This 
working  situation  is  the  key  to  the  many  languages  presently  available. 


7 


The  system  is  now  commercially  available  in  British  and  American  English,  German,  French,  Italian, 
Spanish,  Norwegian,  Danish  and  Swedish  (3),  (4),  (5),  (6),  (7).  Some  other  languages  have  also  been 
studied  in  this  context,  but  not  brought  to  a  state  where  they  are  ready  for  commercial  introduction. 
The  generality  of  the  system  has  been  used  in  many  other  research  applications  such  as  in  a  music 
synthesis  project  and  in  modelling  the  speech  of  deaf  persons. 

The  DIG  box  transforms  numeric  expressions,  including,  for  example,  monetary  amounts,  to 
pronunciations/phonetic  strings.  The  LEX  component  is  the  only  component  not  formulated  as  rules. 
It  contains  both  whole  words  and  roots  that  for  some  reasons  are  not  handled  by  rule.  The  SUF 
component  strips  the  endings  from  words  to  be  looked  up  in  the  LEX.  The  ROT  component  merges 
roots  and  endings  on  the  phonetic  level.  Words  that  are  not  processed  by  the  previous  components 
are  passed  on  to  the  GRAF  rules  that  do  a  rule  based  grapheme-to-phoneme  transformation.  The 
BLISS  is  a  component  that  is  used  for  syntactic  analysis  based  on  lexical  and  work  structure  information. 
Finally  the  FON  component  produces  parameters  for  the  language-independent  speech  production 
model,  the  speech  synthesizer. 

Adding  a  new  language 

The  driving  force  for  developing  a  new  language  for  the  system  has  varied.  Some  of  the  languages 
were  originally  developed  under  Swedish  research  contracts.  Being  part  of  a  very  small,  9  million 
persons,  language  community  Swedes  frequently  have  to  work  in  other  languages.  For  the  further 
development  co-operation  has  been  established  with  other  research  teams  throughout  Europe.  Some 
of  this  development  for  the  major  European  languages  has  been  supported  by  Infovox,  anticipating 
commercial  benefits.  Other  projects  like  Danish  and  Norwegian  have  been  financed  as  handicap 
research  to  make  the  technology  and  applications  available  to  the  handicapped  community. 

It  is  hard  to  give  an  estimate  to  the  effort  and  cost  needed  to  create  a  version  of  the  text-to-speech 
system  for  a  new  language.  The  quality  criterion  for  a  system  is  strongly  tied  to  the  motivation  of  the 
user  and  the  specific  application.  The  quality  will  only  gradually,  after  many  years  of  hard  work,  get 
close  to  the  quality  of  human  speech.  A  workable  system,  however,  has  in  some  instances  been  running 
in  less  than  a  man  year. 

Languages  differ  very  much  in  the  complexity  needed  for  the  different  components  of  the 
text-to-speech  system.  The  background  knowledge  available  for  different  languages  are  also  quite 
different.  In  some  countries,  with  a  strong  tradition  in  acoustic  phonetics  and  linguistics  it  will  be  much 
easier  to  find  the  right  experts.  Much  of  the  knowledge  is  already  available,  even  if  it  is  not  expected 
to  be  in  a  form  appropriate  for  the  text-to-speech  system.  One  main  factor  for  deciding  the  complexity 
of  the  task  is  how  obvious  and  rule  governed  the  relation  between  the  orthography  and  the  phonetic 
transcription  is.  Another  important  factor  is  how  similar  the  phonetic  structure  is  to  other  languages 
already  developed. 


8 


Special  needs  for  the  blind 

There  is  a  very  practical  need  for  different  speaking  styles  in  text-to-speech  systems.  Such  systems  are 
now  used  in  a  variety  of  applications  and  many  more  are  projected  as  the  quality  is  developed.  The 
range  of  applications  ask  for  a  variation  close  to  the  one  found  in  human  speakers.  General  use  in 
reading  stock  quotations,  weather  reports,  electronic  mail  or  warning  messages  are  examples  where 
humans  would  choose  rather  different  ways  of  reading.  The  most  comon  application  today  is  in  aids 
for  the  handicapped. 

Visually  impaired  persons  have  very  specific  needs.  On  one  extreme  end  of  the  style  continuum  they 
want  hyper-correct  speech  that  gives  them  maximally  exact  information  how  the  text  is  written. 
Spelling  is  in  this  case  a  possibility,  but  it  is  too  slow  and  gives  hardly  any  understanding  of  the  text 
content.  The  other  extreme  is  very  fast  speech  for  information  scanning. 

Fast  speech  for  the  visually  impaired 

Even  if  there  are  means  of  varying  the  speaking  rate  in  the  normal  text-to-speech  system  they  are  not 
appropriate  for  the  extremely  high  speaking  rates  demanded  by  the  visually  impaired.  Normal 
speaking  rate  is  often  estimated  to  be  around  150  words  per  minute  (wpm).  The  demand  from  the 
blind  is  to  obtain  speaking  rates  of  around  500  wpm  to  approximate  fast  silent  reading  by  sighted 
persons.  However,  words  per  minute  is  a  rather  uncertain  measure  of  speaking  rate.  This  measure 
will  be  language  and  text  dependent  and  it  will  also  increase  if  pauses  are  not  included.  Texts  vary  a 
lot  as  to  the  mean  word  length.  Languages  also  vary  considerably.  In  a  study  of  European  languages 
we  found  that  German  words  were  more  than  50%  longer  than  French  in  the  corpora  of  the  10,000 
most  frequent  words  in  the  language  (9).  One  way  to  obtain  these  "super  human"  rates  would  be  to 
ignore  progressively  more  of  the  low  information  content  of  the  text.  This  would  mean  some  kind  of 
key  word  reading  without  any  language  structure.  After  preliminary  tests  we  abandoned  this  idea.  One 
problem  with  this  solution  is  how  to  predict  where  the  keywords  are.  It  also  seemed  to  us  that  the  lack 
of  linguistic  structure  was  very  confusing.  If  at  all  possible,  it  is  our  conviction  that  we  should  model 
the  speech  on  human  performance.  At  these  high  speeds,  however,  there  is  no  good  human  template. 
We  don’t  want  to  extrapolate  from  the  point  where  human  speech  production  breaks  down. 

The  rationale  in  our  current  attempt  to  increase  speed  further  is  that  the  phonetic  component  could 
be  changed  for  a  modified  and  faster  component  at  run-time  appropriate  for  very  high  speaking  rates. 
Especially  prosodic  rules  are  simplified  or  taken  away.  The  differences  between  stress  and  unstressed 
syllables  are  still  marked  by  duration  and  fundamental  frequency,  a  new  set  of  inherent  durations  are 
established  and  mean  pitch  also  needs  to  be  increased.  To  most  listeners  the  speech  is  close  to 
unintelligible  at  this  speed  though  it  is  claimed  to  be  useful  by  experienced  blind  listeners. 


Conclusion 

The  multi-lingual  text-to-speech  technology  offers  very  promising  prospects  for  the  visually  impaired. 
It  has  already  been  of  revolutionary  importance  to  quite  a  few  blind  individuals  in  the  vocational  setting. 
Presently  there  is  only  one  truly  multi-lingual  system  on  the  market  and  that  system  doesn’t  even 
include  some  of  the  smaller  European  languages.  On  the  research  and  development  side  much  effort 


9 


is  spent  to  overcome  this  problem  both  nationally  and  within  the  European  Community.  One  example 
is  the  Esprit  project,  Polyglot,  that  will  support  the  future  development  of  such  systems.  It  is  important 
that  the  specific  needs  for  the  visually  impaired  can  be  taken  into  consideration  in  such  development. 
One  leading  ambition  should  aslo  be  that  the  applications  developed  should  be  possible  to  share 
between  different  language  communities  to  increase  the  availability  of  the  new  technology. 


References 

1.  Carlson  R  &  Granstrom  B.  (1986):  Applications  of  a  multi-lingual  text-to-speech  system  for  the  vis¬ 

ually  impaired ,  pp.  87-96  in  (P  L  Emiliani,  Ed.):  Development  of  Electronic  Aids  for  the  Visually 
Impaired ,  Martinus  NijhoffTDr  W  Junk  Publ.,  Dordrecht. 

2.  Granstrom,  B.  (1987):  Speech  technology  for  the  visually  impaired  -  the  Swedish  perspective ,  STL- 

QPSR  1/1987,  pp.  29-38. 

3.  Bladon  A.,  Carlson  R.,  Granstrom  B,  Hunnicutt  S.  &  Karlsson  I.  (1987):  Text-to-speech  system  for 

British  English,  and  issues  of  dialect  and  style,  European  Conference  on  Speech  Technology,  vol. 
1,  Edinburgh,  Scotland. 

4.  Kohler,  K.,  (1988):  An  intonation  model  for  a  German  text-to-speech  system.  Proceedings  of  Speech 

’88,  7th  FASE  symposium,  Edinburgh,  Scotland. 

5.  Barber,  S.,  Granstrom,  B.,  &  Touati,  P.  (1988):  French  prosody  in  a  rule-based  text-to-speech  system , 

Proceedings  of  Speech  ’88,  7th  FASE  symposium,  Edinburgh,  Scotland. 

6.  Granstrom,  B.,  Gustafson,  K.  (1986):  Toneme  11/2  in  a  Norwegian  text-to-speech  system,  Nordisk 

Prosodi  IV,  Odense. 

7.  Granstrom,  B.,  Molbaek  Hansen,  P.,  &  Gronnum  Thorsen,  N.  (1987):  A  Danish  text-to-speech  sys¬ 

tem  using  a  text  normalizer  based  on  morph  analysis,  European  Conference  on  Speech  Technol¬ 
ogy,  vol.,  1,  Edinburgh,  Scotland. 

8.  Carlson,  R.  &  Granstrom,  B.  (1986):  Linguistic  processing  in  the  KTH  multi-lingual  text-to-speech 

system,  pp.  2403-2406  in  Proc.  ICASSP  86,  vol.  4,  Tokyo. 

9.  Carlson,  R.,  Elenius,  K.,  Granstrom,  B.,  Hunnicutt,  S.  (1986):  Phonetic  properties  of  the  basic  voca¬ 

bulary  of  5  European  languages:  Implications  for  speech  recognition,  pp.  2763-2766  in  Proc. 
ICASSP  86,  vol  4,  Tokyo. 


10 


Speech  Quality  Assessment 


Herman  J  M  Steeneken 
TNO  Institute  for  Perception,  The  Netherlands 


Introduction 

Speech  synthesis  has  been  available  for  more  than  twenty  years.  Starting  with  simple  systems  which 
are  able  to  reproduce  short  pre-recorded  speech  tokens,  the  field  has  developed  to  systems  converting 
text  to  speech. 


In  general,  speech  output  systems  are  based  on  waveform  coding,  storage  and  reproduction.  More 
advanced  systems  are  based  on  the  coding  of  specific  speech  parameters  such  as  spectral  shape, 
fundamental  frequency,  etc.  The  latter  method  results  in  a  more  efficient  coding  but  has  in  general  a 
lower  speech  quality. 


In  Table  1  some  examples  of  waveform  coders  and  more  efficient  speech  coding  algorithms  are  given 
together  with  an  estimated  memory  requirement  (expressed  in  bits  per  second)  and  speech  quality. 
Up  to  now,  efficient  coding  leads  to  a  lower  speech  quality  and  to  more  flexibility.  A  text-to-speech 
system  can  be  based  on  pre-stored  elementary  speech  components  like  phonemes  or  diphones.  Such 
a  storage  of  elementary  speech  components  opens  the  possibility  of  reproduction  in  any  desired  order. 
However,  to  obtain  intelligible  speech  with  an  acceptable  quality  some  other  aspects  have  to  be  taken 
into  account.  For  instance,  the  word  stress,  sentence  accent  and  the  intonation  contours  have  a  major 
effect  on  the  acceptability.  Evaluation  of  speech  output  systems  is  therefore  required  to  obtain 
performance  figures  and  to  obtain  more  diagnostic  information  for  the  improvement  of  the 
investigated  systems. 


speech 


speech 


amp  1 1 f  ier 


inte  1 1 1  g  id  1 1 1  ty 


magnetic  tape 


analog 

-*•  amplifier  excellent 

rcg  istrat ion 


analog-  huge 

to-digital  - ►  digital 

convertor  memory 

l 

LPC-analysis  _  digital 

formant  analysis  memory 


digital- 
to-ana log 
convertor 

1 


c*  lOOkB/s 


LPC-synthes is 
formant  synthesis 


-  5kB/s 


excellent 
to  good 


fair  to 
poor 


coding 

data  reduction 
i.e.  phonemes 
diphones 

Table  1.  Principle  of  some  speech  analysis  and  speech  synthesis  systems  together  with  the  required 

storage  capacity  in  bits  per  second  and  the  resulting  speech  quality. 


sma  1 1 

+.  digital  - »  decoding  -0.5-2kB/s  poor 

memory 


11 


In  some  countries  (UK,  France  and  The  Netherlands)  joint  national  research  programs  are  starting  to 
co-ordinate  research  efforts.  Also  a  European  research  project  (sponsored  by  ESPRIT)  started  in 
1988.  With  this  ESPRIT  SAM  project  (multi-lingual  speech  input/output  assessment,  methodology 
and  standardization)  seven  countries  work  together  on  the  development  and  evaluation  of  speech 
input/output.  In  the  USA  an  advanced  research  program  is  proceeding  on  the  development  and 
application  of  speech  input/output  systems  in  military  conditions.  In  NATO  a  research  study  group 
(RSG-10)  is  involved  with  the  application  of  speech  input/output  in  the  multi-lingual  military 
environment. 

Speech  Quality  Assessment 

The  assessment  of  a  speech  output  system  depends  on  the  type  of  system  which  is  involved.  For  a 
waveform  coder,  based  on  real  speech  tokens,  a  segmental  intelligibility  test  on  phoneme  or  word  level 
will  be  satisfactory.  Other  acceptability  items  do  not  depend  on  the  system  itself  and  are  defined  by 
the  speech  tokens  pronounced  by  the  talker.  However,  an  allophone  or  diphone  based  system  affects 
also  the  intonation  contours  and  beside  a  segmental  intelligibility  test,  a  supra-segmental  test  up  to  a 
sentence  level,  is  required. 

A  number  of  subjective  tests  were  developed  during  the  forties,  and  are  extensively  used  for  the 
evaluation  of  speech  communication  channels.  There  are  also  two  objective  tests  available.  These 
tests  are  based  on  the  generation  and  analysis  of  a  special  speech-like  test  signal.  We  can  classify  the 
intelligibility  tests  with  respect  to  their  use  such  as:  items  tested,  diagnostic  information,  number  of 
subjects  required  for  reliable  results,  training  and  measuring  time. 

Another  aspect  is  the  application: 

are  we  comparing  and  rank-ordering  systems? 
are  we  evaluating  a  system  for  a  specific  application? 
are  we  supporting  the  development  of  a  system? 

When  we  restrict  ourselves  to  the  subjective  tests,  a  general  qualification  can  be  made  to  the  items 
tested  and  the  method  of  response.  The  lowest  level  (phonemes)  is  covered  by  the  rhyme  tests  and 
word  tests.  A  rhyme  test  is  a  multiple  choice  test  where  a  listener  has  to  select  the  auditorily  presented 
word  from  a  small  group  of  visually  presented  possible  responses.  In  general  only  the  initial  consonants 
of  the  response  words  are  changed  such  as  Bam,  Dam,  Tam,  Kam,  Pam.  Frequently  used  rhyme  tests 
are  the  Diagnostic  Rhyme  Test  (DRT)  and  the  Modified  Rhyme  Test  (MRT). 

The  DRT  is  based  on  two  alternatives  (Voiers  1977)  while  the  MRT  is  based  on  six  alternatives  (House 
et  al.  1965).  As  the  number  of  responses  is  limited,  they  do  not  coincide  with  the  stimulus  heard  by 
the  listener.  Therefore  a  more  general  approach  is  obtained  with  an  open  response  as  with  word  tests. 

Word  tests  are  based  on  short  nonsense  or  meaningful  words  of  the  CVC  type 
(consonant-vowel-consonant).  The  test  words  are  presented  in  isolation  or  in  a  carrier  phrase.  The 
listener  can  respond  with  any  CVC  combination  he  has  heard.  Hence  all  confusions  between  the 
phonemes  are  possible.  The  test  results  include  the  phoneme  score,  the  word  score  and  the  confusions 
between  the  initial  consonants,  vowels  and  final  consonants.  The  confusion  matrices  present  useful 
information  to  improve  the  performance  of  a  system.  The  speech  reception  threshold  (SRT)  measures 
the  word  or  sentence  intelligibility  against  a  level  of  masking  noise.  The  listener  has  to  recall  a 


12 


presented  sentence  which  was  masked  by  noise.  For  a  correct  response  the  noise  level  is  increased 
while  for  a  false  response  the  noise  level  is  decreased.  This  procedure  leads  to  a  small  range  of  noise 
levels  where  a  50%  correct  recall  of  the  presented  sentences  is  obtained  (Plomp  and  Mimpen  1979). 
The  quality  of  the  speech  is  related  to  the  amount  of  noise  which  is  necessary  for  the  masking.  The 
procedure  has  the  advantage  that  it  can  be  performed  with  naive  listeners.  A  recent  new  development 
is  the  use  of  anomalous  sentences.  These  syntactically  correct  but  semantically  anomalous  sentences 
consist  of  approximately  seven  words.  The  words  are  high  frequency  mono-syllabic  words  with  which 
an  unlimited  number  of  sentences  can  be  generated  randomly.  However,  these  sentences  are 
constructed  according  to  some  pre-defined  grammatical  structures.  This  test  will  be  evaluated  by  the 
ESPRIT  project  SAM. 

Quality  rating  is  a  more  general  method  used  to  evaluate  the  user  acceptance  of  a  transmission  channel 
or  speech  output  system.  The  claim  of  some  investigators  (Goodman  and  Nash)  is  that  a  quality  rating 
includes  the  total  auditory  impression  of  speech  on  a  listener  and  can  be  used  to  discriminate  between 
good  and  excellent  quality.  For  quality  ratings  normal  test  sentences  are  used  and  the  subject’s 
impression  is  obtained  from  a  free  conversation.  The  listener  is  asked  to  rate  his  impression  on  a 
subjective  scale  like  the  five-point  scale:  bad,  poor,  fair,  good  and  excellent.  Different  types  of  scales 
are  used  such  as:  intelligibility,  quality,  acceptability,  naturalness,  etc. 

Fig.  1  gives  for  four  intelligibility  measures  the  intelligibility  as  a  function  of  the  signal-to-noise  ratio 
of  speech  combined  with  noise.  This  gives  some  impression  of  the  effective  range  of  each  test.  The 
given  relation  between  intelligibility  and  the  signal-to-noise  ratio  is  only  valid  for  noise  with  a  frequency 
spectrum  equal  to  the  long  term  speech  spectrum.  This  is,  for  instance,  the  case  with  voise  babble.  A 
signal-to-noise  ratio  of  OdB  means  that  the  speech  and  the  noise  have  equal  energy. 


Fig.  1  Relation  between  signal-to-noise  ratio  and  some  intelligibility  measures 


As  an  example,  the  CVC  word  intelligibility  of  two  speech  synthesis  systems  is  given  in  Table  2.  The 
phoneme  intelligibilty  separated  for  initial  consonant,  vowel  and  final  consonant  is  given  as  well.  The 
intelligibility  scores  are  based  on  eight  listeners.  The  tests  were  performed  for  the  Texas  Instruments 
LPC  based  TMS5220  speech  chip  and  the  Philips  MEA  8000  formant  based  speech  chip. 


13 


TMS  5220 

MEA  8000 

CVC-words 

53 

61 

initial  consonants 

59 

67 

vowels 

98 

97 

final  consonants 

87 

91 

Table  2.  Intelligibility  score  in  percent  of  the  CVC-words,  initial  consonants,  vowels  and  final 

consonants  and  for  two  speech  output  systems. 


In  Table  3  a  confusion  matrix  for  the  (Dutch)  initial  consonants  is  given  as  obtained  with  the  Philips 
MEA  8000.  The  corresponding  consonant  score  according  to  Table  2  is  67%.  As  can  be  concluded 
from  the  relations  between  intelligibility  measures  as  given  in  Fig.  1,  both  systems  fall  in  the  category 
between  fair  and  good  and  both  offer  a  100%  sentence  intelligibility. 


RESPONSE 

B 

D 

F 

G 

H 

J 

K 

L 

M 

N 

P 

R 

S 

T 

V 

W  Z 

STIMULUS 

1 

B 

1 

7 

1 

- 

1 

- 

14  - 

2 

D 

- 

56 

1 

6 

- 

1  - 

3 

F 

4 

G 

- 

- 

5 

7 

4 

- 

- 

- 

- 

- 

1 

- 

- 

3 

4 

- 

5 

H 

5 

6 

J 

7 

K 

- 

- 

- 

- 

1 

- 

12 

- 

- 

- 

3 

- 

- 

- 

- 

- 

8 

L 

9 

M 

13 

1 

- 

- 

- 

- 

- 

5  - 

10 

N 

11 

P 

12 

R 

1 

31 

16  - 

- 

13 

S 

- 

14 

T 

15 

V 

8 

13 

10  - 

- 

16 

W 

3 

- 

- 

- 

- 

- 

- 

1 

- 

1 

- 

- 

- 

- 

1 

18  - 

- 

17 

Z 

• 

7 

- 

corr . 


4.2 

87.5 
25.0 

29.2 

58.3 
100.0 

75.0 

100.0 

81.3 
100.0 

50.0 

64 . 6 

87.5 
100.0 

40.6 
75.0 
37.5 


Table  3.  Confusion  matrix  of  initial  consonants  obtained  with  the  Philips  MEA  8000 


Conclusion 

Some  examples  of  evaluation  methods  of  speech  synthesis  systems  have  been  given.  In  a  recent 
ESPRIT  project  several  items  for  future  development  were  identified,  such  as  evaluation  at  sentence 
level  and  objective  evaluation  methods.  An  aspect  not  discussed  in  this  review,  but  relevant  for  the 
application  of  speech  output  systems  in  computer  systems,  is  man-machine  interfacing. 


14 


References 

Goodman,  D  .J.  &  Nash,  R.D.,  Subjective  quality  of  the  same  speech  transmission  conditions  in  seven  dif¬ 
ferent  countries.  IEEE  Trans.  Comm.  30,  642-654,  1984. 

House,  A.S.,  Williams,  C.E.,  Hecker,  M.H.L.  &  Kryter,  K.D.  Articulation  testing  Methods:  Consonan¬ 
tal  differentiation  with  a  closed  response  set,  J.  Acoust  Soc.  Amer.  37,  158-166,  1965. 

Plomp,  R.  &  Mimpen,  A.M.,  Improving  the  reliability  of  testing  the  speech  reception  threshold  for  sen- 
tences ,  Audiology  8,  43-52,  1979. 

Steeneken,  H.J.M.,  Diagnostic  information  of  subjective  intelligibility  tests.  Internat.  Congress  on 
Acoust.  Speech  and  Signal  Processing,  Dallas  1986. 

Steeneken,  H.J.M.,  Comparison  among  three  subjective  and  one  objective  intelligibility  test.  Report  IZF 
1987-8,  TNO  Institute  for  Perception,  Soesterberg,  The  Netherlands. 

Voiers  W.D.,  Diagnostic  Evaluation  of  Speech  Intelligibility.  Chapter  32  in  M.E.  Hawley  (ed.)  Speech 
Intelligibility  and  speaker  recognition,  vol.  2,  Benchmark  papers  in  Acoustics,  Dowden,  Hutchin¬ 
son,  &  Ross,  Stroudburg,  Pa,  1977. 


15 


Text  to  Speech  Synthesis  System  Screen  Reader  for  Visually 

Disabled  People 

Jean-Jaques  Rigoni 
Elan  Informatique,  France 


1.  Abstract 

The  CNET  (Centre  National  d’Etudes  des  Telecommunications)  of  Lannion  has  developed  much 
research  concerning  text-to-speech  synthesis  systems  and  has  sold  many  licences  of  their  know-how. 
One  of  them  is  developed  for  an  IBM  PC  or  AT  compatible  and  is  manufactured  by  Elan  Informatique. 
The  name  of  this  product  is  Televox  and,  at  the  present  time,  it  is  the  best  synthetic  voice  system 
available  in  the  French  language. 

In  1986  the  blind  researcher  Jean  Frontin,  from  the  Tobia  Centre,  Universite  Paul  Sabatier,  Toulouse, 
designed  a  screen  reader  which  can  be  used  with  any  commercial  software.  This  screen  reader,  called 
Edivox,  has  been  written  and  produced  to  Jean  Frontin’s  specifications  by  Elan  Informatique.  This 
product  is  the  result  of  the  co-operation  between  a  large  public  laboratory  (CNET  Lannion),  a 
University  (Tobia)  and  a  private  firm  (Elan  Informatique). 

2.  The  CNET  Speech-to-Text  Synthesis 

The  CNET  Speech-to-Text  System  has  been  in  existence  since  1977  and  it  is  constantly  being  improved. 
It  is  a  diphone  synthesis  with  LPC  synthesiser. 

2.1  Diphone  Dictionary 

The  diphones  are  extracts  from  isolated  words  pronounced  by  a  speaker.  He  gives  his  voice  to  the 
system  and  records  about  1150  diphones  which  are  then  analysed,  corrected  and  recorded  in  the 
diphone  dictionary. 

2.2  The  Orthographic-Phonetic  Transcription 

The  French  language  is  very  complex  to  deal  with  because  of  the  numerous  kinds  of  pronunciation. 
The  orthographic-phonetic  transcription  program  contains  about  400  rules  (compared  with  the 
Spanish  languages  which  only  use  about  50  rules)  to  transcribe  a  text  phonetically  with  minimum  errors. 


23  The  Automatic  Generation  for  Prosody 

A  good  prosody  makes  the  synthesis  more  pleasant  to  hear,  more  understandable,  and  less  tiring  for 
long-term  use.  The  automatic  generation  prosody  system  gives  the  sentence  a  good  prosody  while 
using  only  the  usual  punctuation  signs.  This  prosody  can  be  improved  by  adding  prosodic  markers  in 
the  text,  either  manually  or  automatically.  At  the  output  of  the  prosodic  module,  the  LPC  frames  can 
be  sent  to  the  LPC  sythesiser. 


16 


2.4  Televox 

Televox  system  contains  a  large  PC  card  and  an  MS-DOS  driver  which  regroups  the  different  modules. 
Orthographic  phonetic  transcription,  automatic  prosody  generation,  diphone  concatenation.  The  PC 
card  is  composed  of  the  following  parts: 

•  the  synthesiser  (SDP  Thomson) 

•  a  4W  amplifier  for  an  external  speaker 

•  a  telephonic  interface  (according  to  different  models) 

3.  Edivox  Screen  Reader 

The  screen  reader  Edivox  software  is  resident  in  memory  and  gives  the  possibility,  by  using  a 
combination  of  keystrokes,  to  stop  the  execution  of  the  program  in  progress  and  to  read  what  is  present 
on  the  screen.  It  gives  the  following  functions: 

•  moving  the  reading  cursor 

•  reading  letter  by  letter,  word  by  word,  line  by  line,  sentence  by  sentence,  or  by  reading  the 
complete  screen. 

Numerous  other  functions  can  be  used,  for  example: 

•  reading  the  ASCII  equivalent  of  a  character 

•  reading  the  cursor  position 

•  keyboard  caps  lock  or  number  lock  mode 

•  finding  a  special  character  string  or  reverse  video  zones  on  the  sreeen. 

About  50  functions  are  available. 

A  recent  release  of  the  system  (V4),  gives  the  possibility  of  using  windows  (3  windows  per  screen)  with 
20  different  possibilities  of  screen  models.  Macro  commands  permit  the  link  of  about  10  elementary 
commands.  This  improvement  makes  the  Edivox  system  very  easy  to  use  in  many  kinds  of  programs: 
MS  Word,  Wordperfect,  dBase  III,  Multiplan,  Minitel. 

4.  Edivox  Utilisation 

Edivox  aim  is  to  be  used  with  any  software.  We  have  observed  that  the  most  frequent  uses  are: 

•  text  processing  (Word,  Writing  Assistant,  Wordperfect,  Sprint...)  which  permits  a  blind  per¬ 
son  to  produce  his  correspondnece,  a  student  to  type  his  course  notes,  etc. 

•  the  databases  (dBase  III) 

•  the  Minitel  (see  below) 


But  numerous  other  uses  are  possible  such  as  chess  programs,  special  software  for  kinesitherapists. 


17 


5.  Edivox  and  Minitel 

Minitel  is  a  telematic  (videotex)  network  from  French  Telecom.  Three  million  terminals  were  given 
free  to  firms  and  private  people.  Several  hundreds  of  services  can  be  connected  to  using  this  network, 
for  example,  train  or  plane  booking  services. 

Most  of  the  French  newspapers  (Le  Monde,  Le  Figaro,  Liberation)  have  their  information  available 
through  the  Minitel  network. 

A  PC  compatible  can  be  changed  into  a  Minitel  terminal  thanks  to  a  modem  and  software  which  can 
be  bought  for  less  than  2000  French  Francs.  So  it  is  possible  for  a  blind  person  equipped  with  a  PC,  a 
modem  and  Edivox,  to  use  all  the  Minitel  services,  which  is  a  large  opening  to  the  world. 

6.  Conclusion 

Edivox  is,  at  the  present  time,  the  most  widely  purchased  system  in  France,  with  more  than  200  sold. 
It  is  a  very  easily  used  technical  aid  for  blind  people,  especially  those  who  have  been  recently  blinded 
and  don’t  know  braille.  Minitel  is  an  fundamental  application.  Edivox  is  an  excellent  example  of 
co-operation  between  a  public  laboratory,  a  university  and  an  industry. 


18 

A  Word  Processor  for  Blind  Koreans 

Emerson  Foulke,  John  Kilpatrick  &  Young  Woo  Kang 
Perceptual  Alternatives  Laboratory,  USA 


Two  years  ago,  I  was  invited  by  Dr  Young  Woo  Kang,  the  first  blind  person  to  be  awarded  a  degree  by 
a  Korean  university,  to  visit  Taegu  University  in  Taegu,  Korea,  to  deliver  a  series  of  lectures  to  the 
Psychology  Department,  the  Department  of  Special  Education,  and  to  blind  high  school  and  college 
students  who  came  from  all  over  the  country  to  hear  the  lectures.  I  was  also  asked  to  serve  as  a 
consultant  in  regard  to  the  use  of  computers  by  blind  operators,  and  to  conduct  demonstrations  of 
computers,  synthesizers,  braille  embossers,  computer  terminals  with  volatile  braille  displays,  software 
designed  for  use  by  blind  computer  operators,  and  so  forth. 

As  a  result  of  my  conversations  with  these  students,  I  learned  that,  although  they  are  hungary  for 
knowledge  about  computers  and  about  the  ways  in  which  computers  might  be  useful  to  them,  they 
cannot  use  the  computers  now  available.  These  computers  have  been  designed  for  use  by  operators 
who  speak  English.  English  meanings  are  assigned  to  their  keys,  English  text  is  displayed  on  their 
screens,  and  the  prompts  that  guide  interaction  with  the  computer  are  expressed  in  English.  I  also 
learned  that  blind  Koreans  are  still  oppressed  by  a  traditional  culture  that  tries  to  relegate  them  to  the 
occupations  of  massage,  acupuncture  and  fortune  telling.  These  students  are  beginning  to  demand  the 
freedom  to  choose  the  ways  in  which  they  will  strive  to  become  productive  members  of  their  society, 
and  they  feel  that  they  will  be  helped  in  this  struggle  by  acquiring  the  skills  blind  persons  can  offer 
when  they  have  learned  how  to  use  computers. 

As  a  result  of  our  discussions,  I  agreed  to  see  what  could  be  done  about  working  out  the  modifications 
that  would  allow  blind  Koreans  to  use  personal  computers  conveniently.  Then,  the  President  of  Taegu 
University  established  a  new  university  committee,  appointed  me  to  membership  on  the  committee, 
and  charged  us  with  the  responsibility  of  developing  the  modifications  of  computer  hardware, 
computer  software,  or  both,  that  will  make  it  possible  for  blind  Koreans  to  perform  such  computer 
tasks  as  word  processing.  The  committee  met  one  time  while  I  was  still  in  Korea,  and  it  became  clear 
during  that  meeting  that  I  had  a  better  grasp  of  the  work  that  would  have  to  be  done  than  did  the  other 
committee  members.  Accordingly,  we  agreed  that  I  would  take  on  the  work  as  a  project  of  the 
Perceptual  Alternatives  Laboratory,  and  that  I  would  keep  the  other  committee  members  informed 
of  my  progress. 

I  have  set  the  following  objectives  for  the  project  I  agreed  to  undertake.  Korean  letters  should  be 
associated  with  the  keys  on  the  computer  keyboard,  so  that  when  these  keys  are  pressed,  the  Korean 
letters  assigned  to  them  appear  on  the  monitor  screen,  and  are  pronounced  by  the  speech  synthesizer, 
if  the  operator  has  activated  aural  key  monitoring.  The  ASCII  characters  responsible  for  these  letters 
should  be  retained  in  a  file,  so  that  they  can  be  used  as  the  input  to  a  program  whose  output  consists 
of  properly  formed  Korean  characters  that  can  be  displayed  either  on  the  monitor  screen,  or  on  paper 
by  means  of  a  dot  matrix  printer,  operated  in  its  graphics  mode.  In  addition,  the  contents  of  this  file 
can  be  directed  to  a  program,  which  generates  the  signals  that  will  cause  a  braille  embosser  connected 
to  the  computer  to  produce  correct  Korean  braille.  Finally,  the  ASCII  characters  that  are  converted 
into  either  Korean  print  characters  or  Korean  braille  letters,  can  be  delivered  to  a  text-to-speech 


19 


program,  which  replaces  them  with  the  signals  that  cause  the  speech  synthesizer  to  generate  the 
sequences  of  phonemes  that  occur  in  Korean  words,  so  that  the  prompts  that  guide  an  operator’s 
interaction  with  a  computer  can  be  displayed  as  synthesized  Korean  speech. 

As  the  result  of  work  completed  by  Dr  Kilpatrick  and  myself,  most  of  these  objectives  have  been  met. 
Using  the  programs  we  have  developed,  a  blind  Korean  operator  can  now  enter  text  on  the  computer 
keyboard  that  is  specified  on  the  monitor  screen  and  on  paper  by  properly  formed  Korean  characters. 
This  enables  written  communication  with  sighted  co-workers.  If  the  operator  wishes,  the  same  text 
can  be  written  in  braille.  The  blind  operator  who  enters  and  edits  text  is  guided  by  prompts  in  the  form 
of  Korean  speech.  The  repertoire  of  phonemes  used  by  the  Echo  synthesizer  includes  only  the 
phonemes  needed  for  the  pronunciation  of  English  words.  Many  of  these  phonemes  are  also  used  in 
Korean  words,  but  Korean  speakers  use  a  few  phonemes  not  used  by  speakers  of  English.  When  the 
use  of  one  of  these  phonemes  is  indicated,  it  is  replaced  by  the  best  available  approximation  in  the  set 
of  English  phonemes.  The  result,  though  inaccurate,  is  close  enough  to  the  Korean  speech  produced 
by  human  speakers  so  that  listeners  who  know  Korean  can,  with  a  little  practice,  learn  to  identify  the 
prompts  that  guide  their  interaction.  This  task  is  not  as  formidable  as  it  may  seem.  Because  operators 
must  learn  to  identify  only  a  limited  number  of  predetermined  statements,  there  is  a  substantial 
reduction  of  message  uncertainty. 

Like  American  braille,  Korean  braille  is  customarily  written  in  a  contracted  form  in  which  frequently 
occurring  letter  sequences  are  replaced  by  single  characters.  Writing  in  this  way  reduces  the  number 
of  letters  that  must  be  identified  in  order  to  identify  a  word.  This  contracted  braille  is  called  Grade  II 
braille,  and  in  order  to  obtain  a  Grade  II  version  of  text  that  has  been  entered  on  the  computer 
keyboard,  the  text  is  processed  by  a  program  called  a  Grade  II  translator.  This  translator  replaces  each 
word  in  the  original  text  with  the  properly  contracted  representation  of  the  same  word.  We  have  now 
nearly  completed  a  Grade  II  translator  for  Korean  braille.  All  that  remains  is  a  final  stage  of  testing 
and,  if  necessary,  revision. 

At  its  present  stage  of  development,  this  system  can  be  put  to  effective  use  by  blind  Korean  computer 
operators.  However,  it  lacks  the  text-to-speech  program  and  the  complete  repertoire  of  Korean 
phonemes  that  would  give  the  operator  the  ability  to  display  the  contents  of  text  files  as  synthesized 
speech.  Writing  a  text-to-speech  program  should  not  be  difficult,  but  it  cannot  be  done  until  a 
synthesizer  has  been  developed  that  can  generate  all  of  the  phonemes  used  in  the  Korean  language. 
There  are  no  technical  problems  that  have  to  be  solved  in  order  to  know  how  to  make  a  synthesizer 
that  is  adequate  for  the  Korean  language.  It  is  simply  work  that  has  not  yet  been  done.  We  do  not 
have  the  equipment  in  the  Perceptual  Alternatives  Laboratory  that  would  be  needed  to  do  such  work. 
I  tried  to  interest  the  manufacturer  of  the  Echo  synthesizer  in  the  development  of  a  Korean  synthesizer, 
but  I  did  not  succeed.  I  still  hope  to  complete  this  part  of  the  work,  but  my  ability  to  do  so  will  probably 
depend  on  my  success  in  finding  a  manufacturer  of  speech  synthesizers  or  a  speech  laboratory  who  I 
can  interest  in  collaborating  with  me.  In  the  meantime,  Dr  Kang  and  I  will  be  making  plans  to  teach 
blind  Koreans  to  use  the  system  as  it  presently  stands. 


20 


Human  Factors  Issues  for  the  Use  of  Speech  Technology  in 

Information  Products 

Stephen  Furner 

British  Telecom  Research  Laboratories,  England 


Introduction 

There  is  a  convergence  occurring  between  computing  and  telecommunications,  resulting  in  the 
integration  of  media  available  with  which  customers  can  communicate.  This  convergence  is  providing 
higher  band-width  links  than  the  conventional  telephone  service,  and  an  integration  with  computing 
technology  to  enable  new  multi-media  information  technologies  to  be  made  available  between  widely 
separated  geographical  locations  (Timms,  Kee  &  Toncar,  1987).  The  telecommunications  terminals 
that  will  be  used  in  the  future  will  be  able  to  offer  more  than  the  audio  connection  of  the  basic  telephone 
(d’Oultremont  1988).  Speech  based  communication  will  be  integrated  within  a  multi-media  context 
as  part  of  a  bundle  of  facilities  made  available  to  the  customer.  This  bundle  of  facilities  will  include 
the  capacity  to  deal  with  both  spoken  and  visually  based  data.  The  introduction  of  the  integrated 
services  digital  network  (ISDN)  within  the  UK  will  provide  access  to  a  limited  band-width  for  speech 
and  data  over  a  switched  public  network.  The  European  RACE  progam  (Reserch  and  development 
into  Advanced  Communication  technology  for  Europe)  is  targeted  at  providing  a  switched  broadband 
service  within  Europe  by  1996.  The  broadband  network  will  allow  full  multi-media  communication 
between  locations  attached  to  the  system. 

The  integration  of  media  for  input/output  to  information  systems  for  private  or  public  access  raises 
significant  human  factors  problems  as  well  as  opportunities,  for  both  the  conventional  user  and  the 
user  with  special  requirements  such  as  the  blind.  Within  the  scientific  discipline  of  ergonomics,  tools 
have  been  developed  to  deal  with  the  ease-of-use  issues  at  the  boundary  between  an  information  system 
and  its  user  population.  Often  the  introduction  of  ergonomics  is  little  more  than  following  common 
sense  principles,  such  as  paying  attention  to  what  the  customer  tells  you  about  the  difficulty  he  or  she 
is  having  in  carrying  out  a  typical  task  with  the  device.  A  fairly  obvious  principle,  which  forms  the  basis 
of  user-centred  design,  is  to  try  the  product  out  and  change  it  if  it  does  not  do  what  the  customer  wants! 
Avery  simple  approach,  but  one  which  is  all  too  often  ignored  at  the  cost  of  the  uptake  of  the  product 
by  potential  users  and  the  viability  of  a  technical  advance  being  realised  as  a  commercial  advantage. 
To  allow  complex  information  technology  to  be  available  to  benefit  the  widest  possible  community  of 
users,  and  not  just  a  small  group  of  highly  skilled  technocrats ,  it  is  important  that  there  be  an  ergonomic 
input  into  the  design  process.  The  tools  available  within  ergonomics  can  make  a  contribution  to  dealing 
with  the  usability  requirements  for  customers  with  special  needs,  as  well  as  those  of  the  conventional 
customer. 


The  "Direct  Manipulation"  Interface 

The  high  resolution  displays  used  for  -the  integration  of  media  at  a  terminal  enables  an  application 
designer  to  exploit  the  direct  manipulation  style  of  interaction  between  a  system  and  its  user.  Here  the 


21 

graphics  capacity  is  employed  to  construct  the  metaphor,  or  conceptual  model,  through  which  the  user 
interacts  with  the  functionality  available  within  the  application.  Commands  are  invoked  to  control  the 
operation  of  the  application  by  the  manipulation  of  the  screen  images.  Typically  the  manipulation 
would  be  via  a  gestural  input  device  such  as  a  mouse  and  a  conventional  QWERTY  keyboard.  A 
popular  example  of  this  type  of  interface  is  that  offered  by  the  Apple  Macintosh  micro-computer.  Here 
the  basic  screen  display  offered  to  the  customer  is  intended  to  represent  a  desk-top.  To  delete  a  file, 
a  representation  of  a  document  is  dragged  across  the  screen  and  deposited  into  a  representation  of  a 
wastepaper  basket  or  dustbin.  The  obvious  reference  for  this  manipulation  of  the  representation  is 
with  throwing  away  a  piece  of  paper  in  a  conventional  office  environment.  In  this  example,  the 
representation  allows  the  user  to  interact  with  the  facilities  offered  by  the  computer,  employing 
common  concepts  from  a  domain  assumed  to  be  familiar  to  the  user. 

The  advantage  of  the  direct  manipulation  interface  for  sighted  users  is  that  the  knowledge  they  have 
in  the  domain  of  the  metaphor  can  be  employed  in  the  task  of  interacting  with  the  information  system. 
The  user  can  employ  existing  knowledge  and  skills  to  communicate  with  the  system,  rather  than  go 
through  an  extended  learning  procedure  to  acquire  new  skills.  This  approach  to  interface  design,  while 
improving  ease-of-use  for  sighted  users,  does  not  necessarily  improve  access  for  users  with  special 
requirements  such  as  the  blind.  The  emphasis  on  the  use  of  a  visually  presented  metaphor  presented 
in  a  graphical  form  may  impose  two  difficulties  that  will  need  to  be  addressed.  Firstly,  there  are  the 
psychological  issues  associated  with  the  design  of  the  representation  being  employed  to  construct  the 
metaphor  at  the  interface,  and  secondly,  there  are  technical  issues  associated  with  providing  an 
alternative  display  medium  that  can  be  used  to  present  the  graphical  representations  to  the  blind  user. 

Where  audio  facilities  exist  within  a  multi-media  interface  an  opportunity  exists  to  exploit  them  so  that 
a  user  unable  to  employ  visual  interaction  can  still  make  use  of  the  facilities  that  the  sytem  provides. 
However,  this  may  not  be  a  straight  forward  task.  The  degree  of  access  that  the  audio  facilities  have 
within  the  system  to  interact  with  the  visual  representations  available,  will  be  dependent  upon  the 
architecture  that  has  been  employed  to  build  the  application  and  operating  environment.  If  a  virtual 
screen  technique  has  been  employed  it  may  simplify  the  construction  of  an  alternative  output  form. 
Here,  the  system  maintains  an  internal  representation  of  the  material  being  presented  to  the  user, 
rather  than  displaying  directly  on  the  screen.  The  virtual  screen  can  be  accessed  by  different  output 
devices  and  an  appropriate  interpretation  of  its  contents  made  for  the  presentation  format  being  used 
(Vanderheiden,  1985). 

Dialogue  Design 

The  communication  task 

In  interacting  with  a  system,  a  customer  is  typically  entering  into  a  conversational  dialogue  to  achieve 
some  specific  goal.  There  are  three  types  of  communications  tasks  that  information  systems  provide :- 

a.  person  to  person  -  this  is  immediate  real  time  interaction  between  two  or  more  people; 
this  could  be  a  conferencing  facility  on  an  E-mail  system.  Unix  operating  systems  will 
typically  offer  a  "talk"  or  "chat"  facility  where  a  user  can  employ  the  terminal  to  converse 
in  real  time  with  other  users  logged  into  the  system  at  that  time. 


22 

b.  person-system-person  -  this  is  essentially  a  messaging  activity.  Here  there  is  a  delay 
between  the  sending  of  the  message  and  its  receipt  at  its  destination. 

c.  person  to  system  -  this  would  be  for  a  task  such  as  database  retrieval,  where  the  user  is  not 
communicating  with  any  other  people  on  the  system,  only  with  a  resident  application. 

Blind  users  will  need  to  be  able  to  gain  access  to  information  systems  and  enter  into  these  types  of 
communication  if  they  are  to  make  use  of  computing  technology.  It  is  therefore  important  that  design 
decisions  taken  about  the  logical  structuring  of  information  systems  should  not  preclude  non-standard 
output  devices  from  being  employed.  Structured  information  systems  design  methodologies,  such  as 
JSD,  SSADM,  etc,  need  to  be  able  to  cater  for  the  requirements  of  both  conventional  users  and  users 
with  special  requirements. 


Interacting  by  speech  technology 

Speech  technology  is  becoming  more  widely  used  in  industry  to  provide  hands-free  data  entry  or  to 
allow  the  user’s  vision  to  be  free  from  the  interaction  task.  Visual  inspection  and  stock  control  are 
prime  examples  of  this  type  of  activity  (Baker,  1987).  In  a  visual  inspection  task,  a  user  needs  to  be 
able  to  effectively  interact  with  an  information  system  but  without  taking  significant  visual  effort  away 
from  the  inspection  task. 

The  techniques  developed  within  ergonomics  to  deal  with  speech  based  dialoues  for  conventional 
commercial  applications,  such  as  rapid  prototyping  (Furner,  1987a)  provides  a  source  of  established 
technical  procedures  which  could  be  exploited  to  aid  the  blind  computer  user. 


Ergonomic  design  methods 

Typical  procedures  used  for  introducing  ergonomics  into  the  design  of  a  speech  based  interface  are: 

a.  User  centred  design 

b.  Rapid  prototyping 

c.  Usability  engineering 

These  procedures  can  be  applied  to  deal  with  ease  of  use  factors  within  the  information  technology. 
However,  the  degree  of  effect  that  these  procedures  can  have  is  dependent  upon  the  stage  in  the 
product  design  at  which  they  are  introduced  (Furner,  1987b).  If  they  are  introduced  late  in  the  life  of 
a  project  then  the  scope  for  change  may  be  small,  and  the  ergonomic  input  will  be  limited  to  fine  tuning 
the  design  or  employing  training  material  to  deal  with  any  significant  human  factors  problems. 


Psychological  issues 

It  is  important  to  be  aware  of  the  limitations  of  application  of  conventional  techniques  as  well  as  the 
areas  in  which  they  can  be  implemented.  The  models  and  representations  used  to  provide  access  to 
information  systems  for  sighted  users  may  not  be  appropriate  for  non-sighted  users.  Blind  users  may 
employ  differing  cognitive  strategies  when  dealing  with  information  being  presented  by  the  system 
than  the  conventional  users.  Within  the  context  of  the  representation  of  graphical  interfaces  the 
representations  of  spatial  information'employed  by  the  blind  users  may  not  be  the  same  as  those 
employed  by  sighted  users. 


23 


More  research  is  required  into  the  psychology  of  blind  users  in  the  context  of  human  computer 
interaction,  engineers  need  information  about  the  psychological  issues  relevant  to  the  leading  edge, 
and  estimates  of  the  next  generation  technology  if  it  is  to  be  useful  in  a  design  context  (Furner,  1988). 
Designers  of  informatics  systems  can  only  avoid  inadvertently  building  in  features  that  make  it 
unnecessarily  difficult  for  the  user  to  interact  with  a  system,  if  they  are  aware  of  the  requirements  of 
the  user  population.  This  is  the  case  for  both  the  ordinary  user  and  also  users  with  special  requirements 
such  as  the  blind. 

Since  speech  is  transitory  it  is  important  that  in  any  dialogues  where  it  is  employed  as  the  sole  means 
of  interaction,  the  implications  of  the  load  this  places  on  the  working  memory  of  the  user  be  given 
detailed  consideration. 


Dialogue  characteristics 

Dialogue  design  for  a  speech  system  is  not  a  simple  task  that  can  be  described  with  an  easy  set  of  rules. 

However,  the  following  are  a  list  of  rules  of  good  practice  which  can  prove  useful: 

a.  Complexity  -  the  dialogue  should  be  simple  and  obvious  to  use  from  the  surface 
information  provided  to  the  user.  Avoid  inexplicable  mode  changes  or  leaving  the  user 
confused  about  what  is  going  on. 

b.  Consistency  -  the  dialogue  should  be  consistent  within  its  operation.  Commands  issued  in 
different  parts  of  the  dialogue  should  not  change  their  function.  If  a  voice  change  is  used 
to  indicate  a  mode  change  then  it  should  do  this  in  all  parts  of  the  dialogue  where  the 
change  is  possible. 

c.  Audio  cues  and  prompts  -  cues  and  prompts  should  be  short  simple  and  obvious  in  their 
meaning. 

d.  Response  times  -  the  response  time  from  the  system  should  be  appropriate  to  the  task  that 
is  being  carried  out  (Furner,  1983).  A  long  response  time  can  imply  turn  taking  to  the 
user.  If  a  long  response  time  is  going  to  occur  warn  the  user. 

e.  Media-integration  -  where  alternative  output  forms  are  available,  use  them  to  reduce  the 
memory  load  on  the  user  when  interacting  with  the  system. 


Conclusion 

The  integration  of  speech  into  multi-media  interfaces  offers  an  opportunity  to  use  speech  technology 
for  interaction  with  the  system  by  blind  users.  Conventional  ergonomic  techniques  can  be  applied  to 
deal  with  ease  of  use  issues  for  the  blind  where  they  interact  with  computing  systems.  More  research 
is  required  into  the  psychology  of  the  blind  computer  user  and  the  introduction  of  the  requirements 
of  users  with  special  needs  into  structured  systems  design  methodologies  such  as  JSD,  SSADM,  CORE, 
etc. 


24 


References 

Baker,  J.  (1987)  State-of-the-art  speech  recognition  US  research  and  business  update ,  Proc.  European 
Conference  on  Speech  Technology,  Edinburgh. 

Furner,  S.M.  (1983)  Response  delay  time  on  interactive  viewdata  information  systems ,  Proc.  10th  Inter¬ 
national  Symposium  on  Human  Factors  in  Telecommunications. 

Furner,  S.M.  (1987a)  Rapid  prototyping  as  a  design  tool  for  dialogues  employing  voice  recognition ,  Proc. 
European  Conference  on  Speech  Technology,  Edinburgh. 

Furner,  S.M.  ( 1987b)  Practical  information  about  interfacing  operating  characteristics  for  engineering  de¬ 
sign ,  IEE  Colloquium  Digest  1987/38. 

Furner,  S.M.  (1988)  Communicating  by  telephone ,  by  D  Rutter,  book  review:  British  Journal  of  Psy¬ 
chology,  79,4. 

D’Oultremont,  P.  (1988)  The  RACE  programme:  The  European  route  towards  Integrated  broadband 
Communications ,  Telecommunications  Policy,  June. 

Timms,  S.,  Kee,  R.  &  Toncar,  J.  (1987)  Broadband  communications:  The  commercial  impact ,  Ovum 
Ltd. 

Vanderheiden,  G.C.  (1985 )  Alternative  access  to  all  standard  computers  for  disabled  and  non-disabled 
users ,  Proc.  RESNA  8th  Annual  Conference,  Memphis,  Tennessee. 


Acknowledgement  is  made  to  the  Director  of  Research,  British  Telecom  Research  Laboratories,  for 
permission  to  publish  this  paper,  also  to  BT  Action  for  Disabled  Customers  for  their  assistance. 


Digital  Daily  Newspapers  for  Blind  People  in  Sweden 

Jan-Ingvar  Lindstrom 
Handikappinstitutet,  Sweden 


25 


Information  from  daily  newspapers  of  different  political  colour  is  of  paramount  importance  for  the 
democratic  process  in  all  developed  countries.  This  information  is  as  important  to  visually  impaired 
people  -  VIPs  -  as  it  is  to  sighted.  Since  1982  a  state  commission  -  The  Swedish  State  Committee  on 
Spoken  Newspapers  -  has  investigated  the  premises  and  conducted  pilot  studies  on  how  to  provide 
VIPs  with  dailies.  The  Commission  has  finished  its  work  and  a  permanent  body  has  been  established 
to  supervise  the  continuation. 

There  are  three  different  methods  in  use  in  Sweden  today,  viz: 


1.  Cassette  papers  distributed  by  mail 

These  are  90  minute  excerpts  of  a  specific  newspaper,  edited  by  journalists  at  the  newspaper  and  then 
recorded,  and  copies  are  distributed  by  ordinary  mail. 

2.  Cassette  papers  distributed  by  radio 

They  are  made  on  the  same  premises  as  the  above  mentioned  but  are  distributed  via  the  broadcast 
network  during  the  night  to  those  VTPs  who  have  a  special  receiver.  The  receiver  records  the  broadcast 
information  automatically  on  a  cassette  playback  machine. 


3.  Unabridged  Dailies  according  to  the  RAPS/RATS  concept 

This  stands  for  Radio  distributed  Braille  Papers  for  the  Blind/Radio  distributed  Speech  Synthesis 
Papers  for  the  Blind.  The  principle  of  RAPS/RATS  method  is  to  extract  the  full  text  from  the 
compositor’s  computer  of  the  newspaper  in  question  and  transmit  the  digitised  information  via  the 
broadcast  network  during  the  night.  The  information  is  received  and  stored  in  hard  disk  memory  in 
the  home  of  the  VIP.  With  the  aid  of  a  search  procedure,  much  like  the  one  used  for  the  retrieving  of 
information  from  databases,  any  piece  of  information  could  be  read,  either  on  a  transitory  braille 
display  or  with  a  speech  synthesiser. 

There  is  a  fourth  information  channel  exclusively  for  deaf-blind  people,  where  short  news  is  stored 
digitally  in  a  floppy  disk  memory  of  a  PC  connected  to  the  telephone  network  by  a  modem.  Deaf-blind 
people  in  possession  of  a  paperless  braille  machine,  connected  to  the  telephone  network,  can  call  the 
computer  and  read  the  information  sequentially. 


The  RAPS/RATS  method  is  by  far  the  one  that  gives  the  best  access  to  any  piece  of  information  in  a 
newspaper.  The  principle  is  illustrated  in  Figure  1. 


I 


2 6 


Fig.  1.  Lay-out  of  the  system  for  the  provision  of  Digital  Daily  Newspapers  for  VIPs. 


Given  a  composing  computer  at  the  newspaper’s  printing  house,  where  all  the  text  information  is 
stored,  the  text  is  extracted,  re-formatted  and  transmitted  to  a  radio  station  where  broadcast  of  regular 
radio  programmes  takes  place.  In  a  period  during  the  night,  when  the  broadcasting  of  ordinary 
programmes  is  not  going  on,  the  digital  information  is  transmitted. 

The  transmission  is  very  quick  -  only  a  few  minutes  for  all  the  ordinary  dailies.  People  who  normally 
receive  programmes  from  the  transmitter  can  also  receive  text  information  from  RAPS/RATS 
provided  they  have  a  special  kind  of  equipment.  This  equipment  consists  of  an  FM  receiver,  a 
microcomputer  with  a  winchester  disc  and  a  keyboard.  A  speech  synthesiser  and/or  a  braille  display 
is  also  connected  in  some  way. 

Since  the  autumn  of  1987,  35  visually  impaired  persons  have  used  the  system  on  a  daily  basis  -  they 
subscribe  to  the  newspaper  Goteborgs  Posten.  They  use  a  piece  of  equipment  which  has  a  built-in 
speech  synthesiser  and  a  small  light-weight  braille  keyboard.  This  keyboard,  which  weighs  only  about 
3  lbs  (1.5  kg)  is  connected  with  the  rest  of  the  equipment  via  a  long  cable.  This  enables  the  user  to  be 
relatively  free  to  read  the  paper  eg  at  the  writing  table,  in  a  favourite  armchair,  or  in  bed. 

The  present  period  has  been  preceded  by  several  pilot  studies,  and  so  about  80  VIPs  have  tested  the 
system  over  at  least  a  four  month  period.  Several  evaluations  have  been  made  and  reported.  They  all 
show  that  the  level  of  satisfaction  is  very  high,  although  some  problems  do  remain.  A  question  of 
paramount  importance  is  the  way  the  information  is  retrieved.  Much  attention  has  been  paid  to  that 
part  of  the  problem  by  the  designer  Mr  Henryk  Rubenstein  at  the  Chalmers  University  of  Technology. 
The  structure  is  shown  briefly  in  Figure  2. 

There  are  about  eight  million  people  in  Sweden.  It  is  estimated  that  about  20,000  of  them  request 
some  kind  of  spoken  or  braille  daily  because  of  sight  impairment.  So  far,  there  are  about  1,200 
subscribers  of  cassette  papers  and,  as  mentioned,  35  subscribers  of  RATS  versions.  The  cost  for 


27 


production  of  the  cassette  papers  have  so  far  been  paid  by  the  Government.  It  is  estimated  to  an 
average  of  about  £100,000  per  annum  for  each  newspaper  with  a  few  hundred  subscribers.  The  cost 
of  a  RATS  version  is  not  known  because  the  method  is  not  yet  developed  to  its  final  stage.  A  total  sum 
of  £5  million  annually  is  now  allocated  and  granted  by  the  Government  for  distributing  daily  papers 
for  blind  people  in  Sweden  -  including  RATS  versions. 


28 


Applications  of  Vocoder  Techniques  for  the  Blind 

Paolo  Graziani 

Istituto  di  Ricerca  sulle  Onde  Elettromagnetiche,  Italy 


Introduction 

One  of  the  most  popular  media  of  storing  texts  for  the  blind  is  the  magnetic  tape.  This  is  the  least 
expensive  and  the  easiest  way  to  transduce  a  text  in  a  form  accessible  to  a  blind  person.  On  the  other 
hand,  the  typical  sequential  access  to  this  source  creates  some  difficulties  in  retrieval  information, 
especially  for  a  dicitonary,  an  encyclopedia  or  in  any  case  in  which  random  access  is  required. 

New  possibilities  of  text  storage,  directly  in  the  form  of  coded  characters  and  translation  into  synthetic 
speech,  are  offered  now  by  informatics.  This  ensures  for  random  access  but  unfortunately,  the  present 
quality  of  text-to-speech  systems  is  not  adequate  for  the  needs  of  a  blind  listener  in  any  kind  of  use. 

In  fact,  while  a  blind  computer  user  usually  accepts  the  limited  quality  of  synthetic  speech  in  interaction 
with  the  machine,  the  problem  of  speech  quality  becomes  important  when  the  speech  is  used  to  listen 
to  long  texts,  such  as  literature.  The  monotonous  and  clearly  artificial  sound  creates  severe  problems 
of  attention  keeping  and  fatigue. 

The  digital  storage  technique  of  sound,  such  as  that  used  for  music,  would  ensure  both  random  access 
and  a  very  good  quality  of  speech  but  this  appears  too  expensive,  at  least  for  a  small  number  of  copies 
of  the  one  text.  In  fact,  the  only  available  technology  useful  for  this  purpose  is  that  of  compact  disks. 
But  the  high  cost  of  the  master  and  the  limited  capability  of  such  a  medium,  in  terms  of  time  according 
to  the  present  standard  of  coding,  make  it  only  feasable  and  practicable  for  a  large  number  of  copies 
of  relatively  short  texts. 

A  good  compromise  between  needs  of  random  access  and  quality  of  speech  could  be  represented  by 
the  use  of  digital  vocoder  techniques.  This  would  ensure  a  solution  to  the  problem  of  digital  coding 
of  speech  that  is  not  too  expensive.  Discussed  below  are  some  technical  aspects  of  the  vocoder 
technique  and  its  application,  both  to  the  storage  of  information  and  to  create  an  alternative  audio 
channel  for  television  through  Teletext. 


The  Vocoder 

A  digital  "vocoder"  (voice  coder  and  decoder)  is  a  technique  in  which  a  double  conversion, 
analogue-to-digital  (AD)  and  digital-to-analogue  (DA),  is  applied  to  the  voice  to  transform  the 
analogue  electrical  signal  into  a  sequence  of  digital  numbers,  then  transmitted  through  a  digital 
communication  channel  or  stored  in  a  memory,  resulting  in  an  inverse  transformation  after  the 
receiving  or  the  retrieval  of  the  data.  . 

The  simplest  coding  is  the  "pulse  code  modulation"  (PCM)  in  which  the  value  assumed  by  the  analogue 
signal  is  taken  at  a  certain  sampling  frequency  and  these  samples  are  digitally  represented  with  a 


29 


number  of  bits,  according  to  the  precision  required.  The  lowest  values  for  speech  coding  are  a 
frequency  of  8000  samples/sec  and  8  bits  for  each  sample;  so  that  the  lowest  bit-rate  required  is  64 
Kbits/sec. 

Several  techniques  of  data  compression  have  been  developed  to  reduce  the  bit-rate  without  the 
degradation  of  speech  quality.  The  most  popular  are  the  ADPCM  (adaptive  differential  PCM),  which 
permits  a  compression  factor  of  4-5,  and  the  "linear  prediction  code"  (LPC),  which  is  much  more 
powerful  with  a  compression  factor  of  20-50. 

In  the  LPC  method,  each  sample  of  the  voice  is  "predicted"  by  means  of  a  linear  combination  of  a 
certain  number  of  previous  samples.  The  coefficients  of  this  linear  combination  are  computed 
according  to  a  mathematical  model  of  the  human  vocal  apparatus.  These  coefficients,  or  other 
equivalent  parameters,  are  transmitted  or  stored  instead  of  the  samples  of  the  voice  and  they  can  be 
updated  with  a  variable  frequency,  according  to  the  characteristic  of  the  signal  to  obtain  a  greater  effect 
of  compression.  A  typical  value  of  bit-rate  which  can  be  obtain  by  using  LPC  is  2400  bits/sec.  These 
values  will  be  used  below  to  discuss  possibilities  of  applying  this  technique  in  the  field  of  technology 
for  the  blind. 


Applications  in  Digital  Recording 

When  we  consider  a  new  technology  for  the  blind,  we  must  take  into  account  an  important  factor 
affecting  the  real  possibilities  of  success:  it  is  necessary  to  avoid  the  development  of  special  hardware, 
since  it  would  result  in  being  very  expensive  due  to  the  limited  market.  Thus  it  is  not  possible  to  think, 
at  time  of  developing  a  special  CD  player,  with  a  different  standard  from  that  used  for  musical  compact 
disks  and  use  this  medium  for  speech  recording  with  a  longer  duration. 

Fortunately,  the  compact  disk  is  becoming  a  mass  storage  medium  for  personal  computers.  This  allows 
us  to  consider  the  possible  use  of  it  as  a  new  form  of  talking  book.  In  fact,  in  this  case,  the  rate  of  data 
retrieval  is  controlled  by  a  program  and  any  new  application  of  CD-ROM  becomes  a  problem  of 
software  development. 

A  CD-ROM  has  a  capacity  of  about  550  Mbytes.  If  used  as  a  medium  of  recording  coded  speech,  by 
assuming  a  bit-rate  of 2400  bits/sec,  it  ensures  a  duration  of  about  500  hours.  The  CD  ROM  technology 
appears  suitable  only  for  large  scale  production  because  of  the  high  cost  of  the  master.  Instead  of  that, 
for  a  small  number  of  copies,  other  optical  disk  technologies,  such  as  WORM  (Write  Once,  Read 
Many)  appear  more  convenient,  since  the  cost  of  each  copy  is  not  dependant  on  the  number. 

For  short  texts,  a  floppy  disk  can  ensure  a  sufficient  capability  too.  In  fact,  in  terms  of  time,  the 
capability  of  a  floppy  disk  range  from  20  minutes,  for  normal  5"  ones,  up  to  80  minutes,  for  the  high 
density  3.5"  disks  from  the  latest  family  of  personal  computers. 

These  values  result  from  the  assumption  of  the  use  of  LPC  technique  with  a  bit-rate  of  2400  bits/sec. 
The  use  of  such  a  code  involves  the  availability  of  a  decoder  device  to  produce  real  time  speech  output. 
Such  devices  are  available  in  form  of  cards  for  IBM  PC.  By  using  this  solution,  no  new  hardware 
development  is  necessary,  at  least  to  produce  an  experimental  system  of  digital  talking  book  reader. 
Cards  for  IBM  PC  with  analogue-digital  converter  and  analysis  programs  for  computation  of  LPC 


30 


parameters  are  also  available,  so  there  are  no  technical  problems  in  the  preparation  of  disks  with  coded 
voice.  In  addition  to  the  possibility  of  random  access  to  the  text,  the  coded  voice  with  LPC  technique 
ensures  the  possibility  of  reading  at  a  variable  speed  without  pitch  distortion;  that  is  with  good 
intelligibility  and  listening  to  the  text  at  a  much  greater  speed  than  the  original  one.  This  is  a  feature 
very  much  appreciated  by  blind  people. 

Another  feature  of  LPC  technique  is  the  possibility  of  speaking  with  a  pitch  different  from  that  of  the 
original  voice.  This  can  be  interesting  for  blind  persons  who  also  have  a  hearing  impairment,  to  adapt 
the  spectrum  of  the  voice  to  the  frequencies  they  are  able  to  hear.  Other  techniques,  such  as  ADPCM, 
adopted  in  some  commercially  available  cards,  store  spoken  messages  on  a  disk  and  are  not  as  effective 
as  LPC  for  this  purpose  because  of  their  low  compression  factor  and  lack  of  flexibility. 


An  Application  for  Television 

Blind  people  are  usually  able  to  follow  television  programmes  when  they  are  adequately  supported  by 
verbal  comment,  such  as  in  certain  news  programmes.  Whereas  they  would  find  some  difficulties  in 
understanding  the  plot  of  a  film  or  any  other  programme  in  which  long  sequences  consist  only  of  images 
with  music  and  no  speech.  In  these  cases,  a  blind  person  needs  an  additional  description  of  the  scene 
in  the  same  way  as  deaf  people  need  subtitles  outlining  the  speech  they  cannot  hear.  This  additional 
information  must  be  optionally  available  to  interested  users  and  must  not  be  a  nuisance  to  other  users. 
A  technical  solution  to  this  problem  could  be  represented  by  a  parallel  radio  channel,  or  a  sub-carrier 
frequency  of  the  TV  channel  itself,  but  this  appears  expensive  and  it  involves  complex  technical 
problems. 

In  Italy  and  in  other  countries  a  good  solution  to  produce  subtitles  for  deaf  has  been  found  by  using 
Teletext.  The  same  medium  could  be  used  to  produce  an  alternative  audio  channel  for  blind  listeners 
consisting  of  a  vocoder. 

This  could  be  accomplished  with  only  a  very  little  reduction  of  the  information  rate  available  for  other 
Teletext  services.  For  example,  Italian  Teletext  (called  Televideo)  transmits  about  16  pages/sec;  each 
page  consists  in  24  lines  of  40  characters  each,  so  that  the  total  bit-rate  is  about  128  Kbits/sec.  As 
mentioned  above,  by  using  the  LPC  method,  we  can  obtain  a  vocoder  with  a  satisfactory  speech  quality 
with  a  bit-rate  of  2400  bit/sec  or  less;  so  we  can  have  an  audio  channel  by  using  not  more  than  2  percent 
of  the  total  information  rate,  with  one  page  every  3-4  sec  devoted  to  a  coded  sequence  of  speech. 

Furthermore,  if  we  take  into  account  that  this  additional  channel  does  not  have  to  produce  full-time 
speech,  but  only  in  absence  of  speech  in  the  ordinary  audio  TV  channel,  this  percentage  may  decrease 
according  to  the  duty  cycle.  In  fact,  it  is  necessary  to  avoid  the  simultaneous  transmission  of  comments, 
through  this  additional  audio  channel,  and  the  normal  dialogue  of  the  broadcast  since  this  would  create 
problems  of  divided  attention  between  two  different  sources  of  information.  From  the  practical  point 
of  view,  at  least  in  an  experimental  phase,  the  receiving  system  could  be  implemented  on  a  personal 
computer,  by  using  commercially  available  hardware  such  as  decoders  for  both  teletext  and  LPC  voice. 

0 

Incidentally,  we  can  observe  that  an  additional  audio  channel  could  have  other  applications  such  as  a 
translation  of  dialogues  of  a  film  in  another  language.  This  could  enlarge  the  market  of  users  and 


31 


consequently  decrease  the  cost  of  industrial  production  of  a  voice  decoder  integrated  into  television 
sets. 

Conclusion 

From  the  proposals  briefly  presented  and  discussed  above,  it  appears  at  least  possible  to  take  into 
consideration  the  LPC  vocoder  as  a  new  audio  technology  for  the  blind.  Experimental  applications  of 
this  technique  can  be  realized  by  using  commercially  available  hardware  and  software,  with  the 
development  of  only  some  special  programs  to  deal  with  the  coded  voice  according  to  the  needs  of  the 
blind  user. 

Regarding  the  use  of  talking  books  in  such  a  form  (apart  from  the  problems  of  the  production),  the 
user  has  to  accept  the  use  of  a  personal  computer  in  order  to  read  a  book.  This  is  certainly  less 
comfortable  compared  with  the  use  of  a  normal  cassette  recorder  or  CD  player,  but  it  offers  other 
advantages. 

Concerning  the  production  of  an  additional  audio  channel  for  television,  the  organisational  problems 
(bearing  in  mind  the  necessity  for  synchronisation  and  compatibility  with  the  normal  audio  channel) 
are  certainly  more  difficult  to  overcome  than  the  technical  ones. 

Anyway,  an  experiment  on  all  these  aspects  of  the  potential  applications  of  the  vodocder  technique 
would  show  the  real  impact  of  such  an  application  on  the  lives  of  the  blind. 


32 


Telework  Projects 

Sean  Kenny 

National  Rehabilitation  Board,  Ireland 


It  is  generally  recognised  throughout  the  world  that  one  person  in  every  ten  has  a  substantial  disability. 
It  is  the  objective  of  every  country  to  deduce  as  far  as  possible  the  incidence  of  disability  and  for  persons 
who  become  disabled  to  provide  a  diagnostic,  treatment,  rehabilitation  and,  where  necessary,  social 
support  service. 

There  are  many  agencies  in  Ireland,  both  public  and  private,  involved  in  areas  of  disability,  prevention 
and  treatment  and  in  the  general  process  of  reducing  the  disabling  effects  of  disease,  injury  and 
impairment.  The  National  Rehabilitation  Board  is  one  of  the  primary  organisations  engaged  in  this 
field  and  has  special  responsibility  for  promoting  the  interests  of  persons  with  disability,  providing 
certain  services  and  advising  the  Minister  for  Health  on  matters  of  concern  to  disabled  persons  and  on 
policy  issues.  Its  key  task  is  to  promote  and  pursue  the  development  of  policies,  programmes  and 
services  which  will  help  to  remove  any  barriers  impeding  disabled  persons  from  achieving  the  fullest 
possible  measure  of  social  participation  and  independence.  This  task  is  undertaken  through  the 
co-ordination  of  rehabilitation  service  provision,  the  promotion  of  policies  which  facilitate 
independence  for  disabled  persons  and  also  by  the  direct  provision  of  a  range  of  services  to  disabled 
persons  and  agencies. 

Affirmative  action  in  the  public  and  private  sectors  is  essential  in  order  to  improve  or  even  maintain 
the  employment  prospects  of  disabled  persons  at  a  time  of  high  unemployment.  Very  many  of  the 
disabled  persons  who  seek  help  with  job-finding  through  our  vocational  service  are  long-term 
unemployed  and  will  face  considerable  difficulty  in  finding  work  without  specialist  supports  even  when 
the  overall  employment  levels  improve.  In  acknowledgement  of  this,  the  National  Rehabilitation 
Board  initiated  a  range  of  innovatory  projects  and  maintained  special  initiatives  to  assist  the  disabled 
job-seeker.  We  are  pleased  to  report  that  last  year  over  1,000  disabled  persons  were  helped  to  find 
employment  as  a  result  of  these  programmes. 

One  of  these  innovatory  projects  is  called  TeAPot  (Teleworking  Applications  and  POTential)  funded 
from  the  EC  STAR  Regional  Fund  for  the  development  of  Action  Research  on  Technology  - 
'Telework",  or  as  it  is  called  here  in  England,  "remote  work".  We  asked  some  searching  questions 
before  we  took  on  this  project,  like:  Computers  in  the  service  of  disabled  persons  -  how  will  different 
groups  of  disabled  persons  be  able  to  avail  of  computer  technology?  What  does  the  future  have  in 
store? 

I  hope  to  share  with  you  some  of  the  practical  uses  of  this  technology  to  enhance  a  person’s  well  being 
and  give  a  greater  sense  of  independence.  The  Telework  Feasibility  Project  I  mentioned,  and  on  which 
I  am  currently  working,  is  one  of  the  new  uses  of  this  technology.  I  hope  to  have  a  full  report  with 
guidelines  and  recommendations  on  how  we  can  use  telework  both  from  the  point  of  view  of  the 
employer  and  the  employee  -  this  will  be  completed  in  March  1989.  Telework  is  where  a  person  works 
from  home  with  their  computer  linked  to  a  company  by  means  of  a  modem.  Britain  maintains  that  by 
1995,  15%-20%  of  its  working  population  will  be  working  this  way.  This  is  one  way  to  assist  some 


33 


disabled  persons  to  get  work  when  they  cannot  physically  get  to  the  workplace  themselves  or  have 
someone  to  take  them  there.  The  disabled  person  and  his/her  minder  can  do  a  lot  of  work  on  the 
computer  but  they  may  not  have  the  chance  to  try  it  under  normal  circumstances. 

Two  of  the  ten  people  I  placed  this  year  as  teleworkers  are  visually  impaired  as  a  result  of  multiple 
sclerosis.  One  was  a  social  worker  by  profession.  He  is  now  working  as  a  clerical  officer  in  a  local 
authority  as  a  data  entry  person,  using  an  IBM  computer  and  Lyon  software  with  large  print.  The  other 
person  was  a  sailor  in  the  Navy.  He  had  been  retained  as  a  clerical  officer,  doing  data  entry  and  using 
Lyon  large  print  software. 

Another  way  the  computer  can  help  the  visually  impaired  person  is  in  home -based  training.  When  a 
person  cannot  get  to  the  conventional  training  centre  a  specially  adapted  computer  with  tutorial 
package  can  help  to  equip  the  person  to  sell  themselves  for  that  job.  This  not  only  helps  the  disabled 
person  to  train  and  gain  a  new  skill,  but  the  minder  as  well.  The  person  trains  in  their  own  environment 
which  makes  it  easier  to  learn  when  he/she  has  a  conducive  environment. 

Some  people  say  that  telework  is  isolation  at  its  best.  I  would  not  say  so  -  it  just  means  that  one  has  to 
manage  one’s  time  for  oneself.  The  work  is  not  a  nine  to  five  situation  so  he/she  can  do  eight  hours  a 
day  whenever  it  suits  their  mood  and  social  life. 

I  view  computers  as  an  employment  opportunity  for  the  visualy  impaired.  If  the  disabled  person  can 
find  some  way  of  interacting  with  the  computer,  then  anything  one  can  programme  it  to  do  becomes  a 
job  opportunity.  This  then  gives  the  person  greater  independence  to  be  able  to  do  things  they  like  to 
do,  because  they  have  now  got  some  money  in  their  pocket.  This  adds  quality  to  life  and  this  is  very 
important  for  one’s  well-being.  The  blind  have  used  computer  technology  in  such  a  way  that  work  itself 
has  become  more  meaningful.  They  are  able  to  use  computers  as  programmers  and  in  cottage 
industries  of  all  kinds.  Many  adaptive  devices  and  programmes  are  now  on  the  market  and  are  coming 
down  in  price.  About  eight  years  ago  two  young  blind  pesons  started  working  in  two  of  our  international 
banks  in  Dublin,  as  programmers.  One  was  using  an  Optacon  with  his  computer.  This  was  slow  but 
he  was  able  to  do  his  work.  He  now  uses  a  speech-synthesiser  that  has  helped  him  speed  up  his  work 
and  reduce  his  error  rate.  He  is  now  a  senior  programmer,  having  got  two  promotions  in  eight  year.  I 
put  this  down  to  the  increased  development  in  computer  software  and  hardware.  We  now  have  nine 
visually  impaired  persons  working  as  programmers  in  a  few  organisations  in  Dublin. 

Training  is  vey  important  and  it  is  necessary  that  training  centres  have  enough  funds  to  be  able  to  keep 
up-to-date  and  are  able  to  install  these  new  devices.  I  do  not  think  professional  people  should  give  up 
so  easily  on  severely  disabled  people  these  days  because  the  computer  can  open  up  so  many  doors  for 
them,  eg  the  deaf/blind.  I  am  working  with  a  girl  who  is  deaf  and  has  very  little  sight.  She  uses  a  CCTV 
and  I  have  Lyon  large  print  software  installed  on  the  computer  -  this  way  she  is  able  to  read  the  screen 
-  so  she  is  able  to  do  some  work  like  writing  for  magazines  and  local  newsletters.  I  know  this  in  only 
one  case  and  it  is  a  pilot  case,  but  I  am  hopeful  that  it  will  lead  to  others  being  looked  at,  to  try  the 
computer. 

The  problems  of  the  blind  and  the  visually  impaired  in  the  use  of  computers  can  be  solved  only  at  the 
level  of  technology  itself.  It  is  a  major  drawback  to  blind  and  visually  impaired  persons  in  the  computer 
field  when  they  have  to  approach  an  employer  with  "I  can  do  the  job  but  you  are  going  to  have  to  lay 


34 


out  IR£  10,000  up  front  for  me".  This  is  one  reason  why  we  should  encourage  more  computer  engineers 
at  universities  and  polytechnics  to  get  involved  in  practical  research  to  find  ways  to  develop  technical 
aids  for  the  blind  and  other  disabled  persons.  There  should  be  a  lot  of  redundant  hardware  about,  as 
companies  upgrade  their  hardware.  This  can  be  used  in  many  ways.  If  the  equipment  is  not  working, 
then  it  can  be  stripped  and  the  working  parts  can  be  registered  into  a  technical  database  which  could 
be  housed,  say,  at  the  RNIB  or  the  BCA.  This  database  could  be  searched  by  any  of  the  technical 
research  units  for  specific  parts  from  the  inventory.  This  would  keep  research  costs  down  and  would 
also  encourage  more  people  to  get  involved. 

There  should  also  be  good  redundant  micros  and  PCs  out  there,  in  industry,  that  could  be  put  into 
training  centres  and  good  use.  I  would  stress  that  the  real  emphasis  should  be  on  getting  more  people 
to  design  projects  that  are  useful.  The  hardware  and  software  are  there  but  they  have  to  be  put  together 
into  a  project  that  is  useful.  So  what  is  needed  is  for  the  engineers  to  receive  good  input  from  the  blind 
community  to  help  them  identify  practical  needs  to  enhance  their  creativity. 

We  are  coming  close  to  the  transparent  computer.  It  would  have  a  significant  impact  on  the  blind  and 
visually  impaired  community.  Transparent  means  that  the  computer  is  doing  a  lot  of  operations  that 
you  don’t  have  to  be  aware  of  -  they  just  happen.  There  are  people  developing  software  right  now  that 
is  getting  us  closed  to  the  stage  where  we  won’t  need  special  programmes.  Should  the  predicted  wave 
of  the  future  come,  the  blind  should  be  ready  for  it.  Being  ready  for  it  means  being  at  decision-making 
levels  in  the  technology  field.  I  feel  that  blind  people  should  move  into  the  area  of  systems  analysis  to 
put  them  at  the  decision-making  levels  in  their  companies.  "Graphics  will  make  a  difference,  and  they 
are  going  to  hurt  us",  one  blind  person  warns.  This  is  why  blind  people  need  to  get  into  systems  analysis 
and  not  be  programmers  per  se. 

Since  the  early  1980s,  when  the  PC  became  readily  available  there  have  been  great  expectations  of  its 
use  in  aiding  disabled  persons.  While  a  lot  has  been  done  in  the  education  and  training  field,  and  some 
good  work  in  the  field  of  technical  adaptations,  we  must  not  become  complacent  because  there  is  a 
great  deal  more  to  be  done  and  we  have  to  keep  encouraging  our  computer  engineers  to  get  more 
involved. 

It  has  been  said  that  computer  technology  might  become  a  new  barrier  between  disabled  persons  and 
the  non-disabled.  Technology  for  the  non-disabled  population  runs  ahead.  They  are  extending  the 
capabilities  for  able-bodied  individuals,  increasing  their  efficiency  and  effectiveness,  providing  them 
with  new  potential. 

Engineers  and  designers  are  doing  wonderful  things  to  make  technology  accessible  to  the  disabled 
population  but  it  is  never  quite  fast  enough.  While  the  computer  is  advancing  handicapped  persons 
two  steps  through  the  use  of  special  programmes,  it  is  advancing  everyone  else  in  society  five  steps. 
More-over,  the  five  steps  are  being  designed  in  such  a  way  that  the  handicapped  individuals  cannot 
take  advantage  of  them,  thereby  leaving  them  actually  three  steps  behind. 

One  of  the  most  difficult  things  for  a  visually  impaired  programmer  is  keeping  up  with  changes  in  the 
field.  Much  reading  is  required  to  learn  about  the  latest  releases  of  computer  systems.  I,  myself,  set 
aside  about  3-4  hours  per  week  for  reading  and  even  then  I  feel  I  am  only  skimming  the  surface.  I 
would  recommend  that  visually  impaired  computer  users  form  computer  clubs  -  in  this  way  they  can 


35 


help  keep  each  other  up-to-date.  In  Ireland  we  have  one  called  VICS.  I  would  suggest  that  the  blind 
computer  users  approach  publishers  to  see  what  way  they  publish  their  magazines.  Many  are  doing 
them  by  desk-top-publishing,  so  it  is  worth  asking  them  to  sent  the  floppy  disks  and  put  these  through 
the  computer  and  voice  synthesiser.  One  visually  impaired  person  in  Dublin  has  done  this  recently 
with  a  magazine  called  V.S.  News  and  they  are  awaiting  his  views  on  this  method  of  distribution.  He 
has  not  got  the  advertisements  on  the  disks  so  he  is  going  to  see  what  he  can  do  about  that.  This  is 
another  way  of  getting  news  on  topical  matter,  rather  than  going  through  a  reader  and  putting  it  on 
tape. 

The  National  Council  for  the  Blind  in  Ireland  hopes  to  set  up  an  exhibition  centre  in  its  new  premises. 
It  hopes  to  have  a  number  of  computers,  software  and  adaptive  devices  for  persons  to  try  out  and  see 
what  is  best  for  them  before  they  purchase. 

I  hope  I  have  given  you  a  picture  of  what  we  are  trying  to  do  in  Ireland.  Technology  is  there  as  a  tool 
to  help,  not  to  hinder.  It  brings  us  closer  to  each  other,  enabling  us  to  share  and  exchange  ideas  much 
faster  and  easier,  so  we  should  not  have  to  re-invent  the  wheel  and  waste  valuable  time  doing  things 
someone  else  has  already  done. 

National  and  international  research  and  inventory  databases  should  be  well  developed  and  made 
known  so  that  people  can  have  easy  access  to  them.  The  EC  Handynet,  when  it  comes  on  stream, 
should  be  a  key  one  for  everyone  to  use,  also  EC  Helios  Second  Action  Programme  should  help  speed 
up  the  use  of  technology  for  disabled  people  and  each  country  should  try  to  submit  good  creative 
projects  for  this  programme. 


36 


Audio  Technology  in  IT  Education  for  the  Blind 


Jill  Hewitt 

Hatfield  Polytechnic,  England. 


Introduction 

At  Hatfield  Polytechnic,  we  have  a  long  history  of  educating  blind  students  in  Information  Sciences. 
This  paper  highlights  the  current  use  of  audio  technology  by  our  blind  students  and  looks  at  it  in  the 
context  of  the  support  environment  needed  by  them.  Some  of  the  problems  encountered  by  our 
students  are  discussed  and  an  idealised  set  of  their  hardware  and  software  requirements  is  developed. 
Finally  the  significance  of  our  own  "Intelligent  Speech  Driven  Interfaces"  project  of  this  area  is 
described. 


Historical  Perspective 

Traditionally,  computing  has  been  a  suitable  career  for  the  blind.  In  a  typical  environment,  a 
programmer  would  use  a  simple  scroll-mode  terminal  or  teletype  to  access  a  central  computer.  The 
scroll-mode  terminal  is  like  an  extension  of  the  typewriter  -  text  is  input  and  edited  on  a  line-by-line 
basis,  so  a  braille  display  which  shows  a  single  line  at  a  time  is  a  suitable  output  medium.  Over  the  past 
few  years,  the  growing  popularity  of  screen-based  editors  and  windowing  systems  has  meant  increasing 
difficulty  of  access  for  the  blind  user.  The  increasing  domination  of  WYSIWYG  (what  you  see  is  what 
you  get)  systems  and  graphical  interfaces  has  compounded  the  problem.  In  addition,  the  nature  and 
diversity  of  the  tasks  performed  on  a  computer  has  changed  significantly. 

A  comparison  of  typical  computing  tasks  performed  by  our  Computer  Science  degree  students  in  1978 
and  1988  serves  to  highlight  the  differences:  (Specialist  applications  like  graphics  have  been  ignored) 

1978  Tasks: 

Programming  -  input  and  editing  of  programs  on  a  teletype  using  a  simple  line  editor. 
Typically,  the  programming  language  was  input  all  in  upper-case  characters. 

Computers  used  -  a  mainframe  for  high  level  programming  and  a  micro-processor 
development  kit  for  low  level  work. 

1988  Tasks: 

Programming  -  input  and  editing  of  programs  on  a  variety  of  terminals  and  workstations. 
All  systems  used  support  a  screen-based  editor,  some  have  a  line  editor.  Typically,  the 
programming  languages  use  a  mixture  of  upper  and  lower  case  characters  and  may  be 
case-sensitive. 

Applications  -  access  to  databases,  spreadsheets,  expert  system  shells  on  a  variety  of 
machines. 


37 


Document  Production  -  creation  of  essays,  reports  and  projects  using  a  word  porcessor  for 
text  and  a  drawing  package  for  diagrams. 

Electronic  Mail  -  creation  and  receipt  of  mail  messages  from  other  students  and  staff. 

Computers  used  -  mainframes  with  two  different  operating  systems.  PC’s  Macintoshes, 
Sun  Workstations.  Microprocessor  development  kits. 

It  is  plain  from  the  above  descriptions,  that  today’s  Computer  Science  students  need  to  be  capable  of 
accessing  several  different  machines  in  a  variety  of  ways. 

Specialist  Equipment 

Last  year  we  had  two  blind  students  in  the  Computer  Science  department.  They  used  the  following 
equipment  to  access  our  computers: 

a  Frank  Audiodata  which  provided  text-to-speech  output  via  an  IBM  PC.  This  could  be 
used  in  stand-alone  mode  with  PC  software  (particularly  Wordstar  for  word  processing 
and  Symphony  for  its  spreadsheet)  or  as  a  terminal  emulator  ("talking  terminal")  to 
access  other  machines  on  the  local  area  network. 

a  portable  Braillink  terminal  which  provided  soft  braille  output,  a  line  at  a  time. 

a  braille  embosser  was  available  on  the  network  at  a  remote  location  for  hard  copy  output. 

Most  of  the  Hatfield  computers  are  networked,  but  the  blind  student  is  at  a  considerable  disadvantage 
if  he  or  she  is  restricted  to  a  single  terminal-type  or  a  single  location,  since  supervised  practical  sessions 
are  given  in  a  variety  of  locations  and  opening  times  of  different  laboratories  vary.  The  students  often 
found  they  could  use  the  Audiodata  during  the  day,  but  as  this  is  not  portable  they  had  to  carry  the 
Braillink  to  the  Computer  Centre  for  use  in  the  evenings.  They  were  not  able  to  access  any  commercial 
applications  on  the  Macintosh  computers  as  they  could  not  get  past  the  mouse  and  icon  environment. 


AudioData  -  discussion 

In  comparison  with  a  Braillink,  there  are  some  disadvantages  in  using  an  Audiodata  system: 
it  is  not  portable, 

it  is  faster  to  read  a  line  of  braille  than  to  listen  to  a  line  of  synthetic  speech, 
speech  is  transitory, 

synthetic  speech  is  distracting  to  others  (although  use  of  an  earphone  solves  this  problem). 

The  main  advantage  provided  by  the  Audiodata  is  its  ability  to  quickly  access  any  position  on  the  screen 
and  read  out  its  contents  by  character,  word  or  line.  It  does  not  however  have  any  knowledge  of  the 
underlying  application,  so  for  example  it  cannot  distinguish  between  the  help  and  text  windows  in 
Wordstar  nor  between  the  cell  boundaries  in  a  spreadsheet.  A  student  who  had  been  using  it  for  a  year 
still  had  some  difficulty  in  matching  the  speech  and  text  cursors  and  was  not  familiar  with  all  its 
functions  -  a  more  comprehensive  on-line  help  facility  would  have  been  beneficial. 


38 


The  rather  poor  quality  of  the  synthetic  speech  did  not  seem  to  present  a  problem  as  the  users  quickly 
became  acclimatised  to  it  and  were  able  to  understand  the  output  even  when  spoken  very  fast,  Vincent 
(4)  and  others  report  similar  findings.  The  lack  of  a  facility  to  add  a  personalised  exceptions  file  for 
mispronunciations  meant  that  some  words  were  consistently  mispronounced. 

The  Audiodata  was  also  popular  with  a  partially  sighted  student  who  used  speech  output  to  browse 
through  files  but  relied  on  sight  when  programming  or  word  processing. 


Support  Environment 

In  additon  to  access  to  a  computer  terminal,  blind  students  also  need  other  support  in  their  studies.  A 
typical  student  attends  lectures  and  seminars,  takes  notes,  reads  books,  produces  essays  and  reports 
and  takes  exams,  as  well  as  using  computers  as  described  above.  All  of  these  activities  can  cause  some 
problems  for  the  blind  student,  they  are  dealt  with  in  a  variety  of  ways: 


Lecture  Notes 

Some  students  have  used  a  Perkins  Brailler  to  take  notes  in  lectures,  but  generally  seem  to  prefer  using 
a  tape  recorder  which  is  less  conspicuous. 

Most  lecturers  produce  handouts  of  notes.  If  these  are  on  the  system  they  can  be  output  on  the  braille 
printer,  though  they  need  to  be  converted  into  an  appropriate  format  (eg:  diagrams  removed,  tabs  and 
blank  lines  removed,  tables  re-written).  Handwritten  notes  are  usually  read  out  to  the  student  by  one 
of  the  team  of  readers,  although  we  have  occasionally  been  able  to  employ  a  typist  to  input  them  to 
the  computer  for  subsequent  braille  output.  Diagrams  may  be  reproduced  on  swell  paper,  but  usually 
need  to  be  redrawn  or  significantly  altered  to  be  of  use. 

The  students  preferred  to  have  a  hard  copy  of  their  notes  rather  than  leave  them  on  the  system  and 
browse  via  speech  or  soft  braille  output.  This  may  have  been  influenced  by  the  fact  that  they  did  not 
have  access  to  a  terminal  at  home. 


Reading 

Students  would  like  to  have  text  books  in  braille,  but  very  few  of  the  course  books  are  available.  Some 
taped  books  and  manuals  are  available,  but  this  is  not  a  popular  medium  for  technical  material  as  it 
cannot  be  easily  referenced  in  an  ad  hoc  fashion.  For  this  reason  a  book  reader  (eg:  Kurzweil)  was  not 
perceived  as  very  useful.  Most  of  the  required  reading  is  done  by  a  team  of  readers  working  face  to 
face. 


Essay  and  Report  Writing 

Students  are  begining  to  make  more  use  of  on-line  word-processing  facilities  through  the  Audiodata 
but  are  restricted  in  that  they  cannot  use  It  to  work  at  home.  They  tend  to  fall  back  on  using  a  portable 
brailler  and  then  typing  a  transcription. 


39 


Exams 

Exam  papers  are  pesented  in  braille  and  may  be  taken  in  one  of  three  ways: 
an  amanuensis, 

dictated  to  a  tape  recorder  and  typed  by  an  audio  typist  -  an  amanuensis  will  still  be  needed 
to  draw  diagrams. 

written  in  braille  and  transcribed  to  an  amanuensis. 

We  have  yet  to  experiment  with  on-line  exam  papers,  but  potentially  the  answers  could  be  typed  directly 
into  a  computer  with  hard  copy  braille  output  for  the  student  and  typed  output  for  the  markers. 


Future  Requirements 

Several  requirements  for  improved  systems  for  blind  computing  students  (and  probably  for  blind 
people  in  general)  have  become  apparent  from  the  study  of  current  practice,  some  may  be  implemented 
with  current  technology,  and  some  will  be  the  subject  of  further  research. 

There  appears  to  be  a  trade-off  between  the  generality  of  any  system  and  the  extent  to  which  it  can 
meet  the  specific  requirements  of  a  blind  user.  For  example,  the  Audiodata  provides  an  interface 
which  allows  the  blind  user  to  access  commercial  software  such  as  Wordstar  or  Lotus  123,  it  does  not 
however  provide  extra  word  processing  facilities  which  might  be  useful  to  a  blind  user.  It  is  essential 
that  blind  users  are  able  to  access  commercial  software,  or  their  employment  prospects  will  disappear. 
Many  researchers,  for  example  (5)  &  (6)  emphasise  the  importance  of  generality  in  interfaces,  but 
there  is  also  a  case  for  considering  specialist  software  and  systems  for  individual  use. 

The  mode  of  speech  output  may  need  to  be  application  dependent  -  Sharp  (8)  identifies  a  need  for 
output  to  vary  depending  on  the  application  -  for  example  tabs  and  end  of  line  characters  may  be 
significant  in  a  computer  program  but  not  an  E-mail  message  -  this  would  imply  some  intelligence  in 
the  interface  to  recognise  the  type  of  application,  or  at  least  user-definable  options  on  output. 


Generalisation 

An  ideal  system  for  a  blind  user  might  consist  of  a  portable  black  box  with  both  speech  and  soft  braille 
output.  This  would  be  capable  of  being  connected  to  any  computer  with  minimal  effort  on  the  part  of 
the  user,  would  have  knowledge  of  windowing  software  and  would  be  capable  of  providing  a  textual 
alternative  to  icons  and  a  keyboard  alternative  to  a  mouse. 

The  development  of  ISO  standards  for  virtual  terminals  will  facilitate  the  introduction  of  such  a  system, 
but  obviously  only  to  the  extent  that  particular  manufacturers  adhere  to  the  standards.  Edwards  (7) 
identifies  the  difficulties  of  building  a  general  interface  to  WIMP  systems  and  concludes  that  at  present 
it  may  not  be  practicable. 

An  alternative  is  to  persuade  software  writers  to  build  hooksD  into  applications  that  can  be  accessed 
from  speech  interfaces  -  in  the  absence  of  legislation,  this  seems  unlikely  to  occur  on  a  large  scale. 


40 


Specialist 

In  addition  to  the  black  box  interface,  there  is  a  need  for  more  software  designed  specifically  to  meet 
the  needs  of  the  blind.  Two  examples  being  developed  at  Hatfield  Polytechnic  are  given  here. 

In  a  study  of  word  processing  requirements  (1)  the  following  list  was  identified  as  important  by  a  blind 
user: 

1.  Insertion  and  moving  of  paragraphs 

2.  Deletion  of  letters  and  words 

3.  Searching  facilities 

4.  Automatic  centering  for  headings  and  footings 

5.  Changeable  margins  on  both  sides  of  the  page 

6.  Spelling  checker/corrector 

7.  A  "quick  pleasant  bleep"  for  every  change 

8.  A  warning  for  dramatic  errors  or  possible  catastrophic  changes 

9.  Commands  for  tables  of  varying  sizes  -  and  the  ability  to  fill  them  in  and  read  them  out 
column  by  column 

These  facilities  are  all  available  in  one  word  processor  or  another,  but  it  is  significant  that  the  user  was 
not  aware  of  a  system  which  provided  them,  and  found  the  word  processors  to  which  she  had  access, 
cumbersome  to  use.  A  prototype  word  processor  incorporating  most  of  these  requirements  has  been 
designed  to  run  on  a  Macintosh  using  its  built-in  speech  capabilities  available  through  the  Macintalk 
desk  accessory  (2)  &  (8).  A  further  enhancement  will  be  to  provide  specialist  facilities  for 
programming  languages. 

Another  blind  user  suggested  that  a  "talking  diary/telephone  book"  (Filofax)  would  be  very  useful  - 
particularly  if  it  was  small  enough  to  carry  around.  Goodyear  (3)  describes  the  design  and  prototype 
of  such  a  system.  It  was  also  developed  on  a  Macintosh.  More  work  is  needed  to  ascertain  the 
appropriate  type  of  user  interface  for  such  a  system,  but  preliminary  evaluation  suggests  that  it  would 
be  a  useful  tool  even  in  its  present  form. 


An  Integrated  System 

If  the  total  working  environment  of  a  blind  student  is  considered,  we  can  imagine  an  ideal  system 
consisting  of  a  variety  of  compatible  audio  and  braille  devices  and  software.  This  working  environment 
of  the  future  might  consist  of: 

a  portable  document/book  reader  with  the  capability  of  transmitting  the  text  to  disc  for 
subsequent  output  in  braille  or  later  perusal  via  audio/output.  Diagrams  should  be 
identified  and  output  on  some  tactile  device  and/or  stored  electronically  and 
subsequently  output  on  swell  paper  with  braille  labels. 

a  personal  workstation  with  network  connections  for  access  to  E-mail,  on-line  databases, 
and  other  computer  softwarfe. 


41 


a  black  box  interface  device  as  described  above  to  provide  speech  and  braille  access  to  the 
personal  workstation  and  any  other  computers. 

a  portable  and  silent  device  for  note  taking  in  lectures 

specialised  software  for  word  processing,  spreadsheets,  personal  databases,  etc. 
a  braille  embosser 

speech-to-text  software  for  making  quick  notes  and  for  automatic  conversion  of 
tape-recordings  into  text. 

software  for  reformatting  stored  documents  for  subsequent  output  in  braille. 

software  for  interfacing  between  applications,  so  that  for  example  a  standard  spreadsheet 
interface  could  be  learned  and  all  others  translated  into  this  form  for  processing. 

The  list  could  go  on,  it  will  be  different  for  different  types  of  user.  The  important  points  are  that  there 
is  a  need  for  integration  of  devices  and  a  need  for  the  consideration  of  human  factors  for  blind  user. 
Software  development  is  needed  which  improves  the  usability  of  existing  audio  systems  by  cushioning 
the  users  from  the  underlying  application  and  by  providing  specialist  facilities  for  them. 

Vincent  (4)  identified  the  paradox  that  technology,  which  helps  to  solve  problems  experienced  by  blind 
people,  can  also  widen  the  gap  between  those  with  and  without  a  visual  handicap.  The  gap  is  still  there, 
if  not  widening,  and  a  lot  of  work  is  needed  to  close  it. 


The  Hatfield  Polytechnic  "Intelligent  Speech  Driven  Interfaces"  Project  (ISDIP) 

In  this  three-year  NAB  funded  project  we  are  looking  at  the  applicability  of  artificial  intelligence  to 
speech  based  interfaces.  It  may  be  that  the  requirements  of  a  generalised  speech  interface  can  best  be 
served  by  the  utilisation  of  an  intelligent  interface  management  system. 

A  study  of  office  tasks  and  dialogues  will  provide  information  for  an  intelligent  knowledge  base  which 
will  be  used  to  enhance  both  the  speech  generation  and  recognition  processes.  A  user  model  will  be 
built  into  the  system  and  will  evolve  according  to  a  user’s  preferences  and  pattern  of  work.  Each 
application  program  will  be  modelled  by  providing  a  set  of  details  of  the  application  and  its  modes  of 
uses. 

The  human  factors  of  speech  based  interfaces  will  be  considered  and  an  iterative  development  cycle 
will  be  used  to  produce  demonstrators  of  typical  office  applications  using  speech  input  and  output. 

The  particular  relevance  of  this  project  to  the  blind  is  in  three  areas: 

the  development  of  bespoke  applications  using  speech  output 

the  utilisation  of  the  intelligent  knowledge  base  to  enhance  the  text-to-speech  output  by: 


42 


1.  automatically  adjusting  the  type  of  output  according  to  the  type  of  application 
(eg:  speech  structure  for  a  programming  language) 

2.  providing  a  structure  for  the  synthesis  by  concept  of  information  not  held  in  a 
textual  form  (eg:  in  databases) 

the  investigation  into  the  practicality  of  developing  a  generalised  speech  interface. 


Conclusions 

In  conclusion,  further  work  is  needed  to  integrate  audio  technology  into  a  set  of  equipment  (hardware 
and  software)  which  will  provide  blind  people  with  access  to  current  systems.  There  is  a  need  for 
portable  general  purpose  devices  and  for  specialist  appications  software.  Ideally,  conversion  software 
will  be  developed  which  can  enhance  the  usability  of  commercial  applications  by  presenting  them  in 
a  standardised  interface  format  -  there  may  be  a  requirement  for  intelligent  knowledge  bases  to  provide 
appropriate  models  of  application  and  user. 

Employers  need  to  be  convinced  that  blind  people  can  cope  with  the  variety  of  interfaces  in  use  in 
applications  today  -  to  this  end  it  is  important  not  only  to  develop  appropriate  tools  but  to  make  them 
available  to  students  so  that  they  can  develop  marketable  skills.  The  Manpower  Services  Commission 
will  provide  equipment  for  blind  people  in  employment,  but  there  is  currently  NO  mechanism  for 
providing  blind  students  in  Higher  Education  with  the  equipment  they  need  -  they  must  rely  on  the 
good-will  and  budgets  of  the  individual  organisations.  A  new  policy  is  urgently  required  to  ensure  that 
students  can  gain  access  to  appropriate  equipment. 

This  paper  has  concentrated  on  the  needs  of  blind  Computer  Science  students  -  a  tiny  minority  of  the 
blind  community,  but  important  because  they  are  amongst  the  first  users  of  new  technology.  If  they 
have  difficulty  with  it,  the  chances  are  not  good  for  the  non-specialists. 


References 

1.  An  Investigation  into  the  Specification  of  a  Word  Processor  with  Special  Facilities  for  the  Blind  B.A. 

Smith.  Hatfield  Polytechnic  MSc  Project,  1986 

2.  The  Development  of  a  Prototype  Word  Processor  for  the  Blind  Underwood.  Hatfield  Polytechnic 

BSc  Project,  1988. 

3.  The  Design  and  Implementation  of  a  Prototype  Speaking  Diary  and  Phone  Book  Application  for  use 

by  the  Blind  and  Partially  Sighted  J.E.  Goodyear,  Hatfield  Polytechnic  MSc  Project,  1987. 

4.  Blind  Students  and  Distance  Education  Some  Experiences  with  Micro  Computers  and  Synthetic 

Speech ,  T.  Vincent,  Open  University.  PLET  23,1. 

5.  Computing  for  the  Blind  User.  Aries  Arditi  &  Arthur  Gillman,  Byte  March  1986. 


43 


6.  Design  of  a  talking  videotext  terminal  for  the  Blind.  Oluwole  R.  Omotayo,  Radio  &  Electronic  En¬ 

gineer  March  1984. 

7.  Adapting  User  Interfaces  for  Visually  Disabled  Users.  A.  Edwards,  Open  University  PhD  Thesis  1987. 

8.  An  Evaluation  of  the  Apple  Macintosh  as  a  Computer  for  the  Blind  and  Partially  Sighted.  D.C.  Sharp, 

Hatfield  Polytechnic  MSc  Project  1987. 


44 


Audio  Technology  for  Blind  Programmers 

Gerry  Ellis 

Irish  Computer  Society,  Ireland 


Introduction 

Being  blind  myself,  I  shall  be  speaking  very  much  from  a  user’s  point  of  view.  Research  resources 
directed  at  audio  technology  and  associated  areas  in  Ireland  have  been  fairly  restricted  up  until  now. 
Most  of  the  actual  developments  have  been  carried  out  in  third  level  institutions  -  Trinity  College, 
University  College  Galway,  the  National  Institute  for  Higher  Education  and  by  individuals  in 
association  with  University  College  Cork.  VTEK,  one  of  the  largest  companies  in  the  field,  have  their 
European  headquarters  in  Dublin,  but  this  does  not  include  a  research  unit. 

The  following  are  some  examples  of  projects  successfully  embarked  upon  in  the  last  few  years: 

1.  A  word-processor  and  braille  production  system  have  been  devised  to  make  full  college 
computer  facilities  available  to  the  visually  impaired  student. 

2.  Modifications  have  been  made  to  BBC  micro  systems.  Amongst  other  uses,  this 
personalised  work  station  has  been  used  to  interpret  ham  radio  signals. 

3.  A  PC-based  braille  production  centre  has  been  established  and  a  Kurzweil  reading 
machine  has  been  connected  to  a  PC.  These  moves  greatly  improve  the  efficiency  of 
translation  of  the  written  word  to  magnetic  media  or  to  braille. 

4.  Publishers  have  been  prepared  to  make  books  available  on  magnetic  media,  greatly 
improving  access. 

5.  A  student  project  was  undertaken  to  investigate  how  to  start  making  the  Apple 
Mackintosh  accessable  to  the  visually  impaired. 

6.  An  investigation  is  in  progress  into  methods  of  improving  strategies  of  speech  production. 

7.  Related  work  has  been  undertaken  in  the  Central  Remedial  Clinic  to  aid  the 
speech-handicapped,  but  their  mandate  does  not  allow  for  specific  support  of  the 
visually  impaired. 


Pressing  Problems 

The  problems  to  be  solved  are  still  many.  As  stated  by  Mr  Kenny  earlier,  for  each  step  forward  the 
visually  impaired  take,  the  able-bodied  computer  user  bounds  forward.  As  a  user,  my  assessment  is 
that  the  following  are  some  of  the  most  pressing  problems  for  the  visually  impared  computer  user: 


45 


1.  Standardisation  of  speech  control  characters,  communication  systems,  etc  is  almost  non-existant. 
This  area  needs  particular  and  speedy  attention  to  allow  mobility  of  the  visually  impaired  computer 
user  across  systems. 

2.  The  price  of  voice  synthesizer  systems  is  a  major  deterrant  to  many  potential  computer  users.  It 
should  be  the  aim  of  developers  to  make  the  best  possible  systems  available  at  a  low  price. 

3.  Graphics,  Icons,  Windows  and  the  use  of  the  Mouse  are  trends  which  ever  increasingly  preclude 
the  visually  impaired  from  using  state-of-the-art  software.  Solutions  to  these  problems  are  urgently 
required. 

4.  To  use  systems  and  to  keep  up  to  date  with  developments,  access  to  manuals,  magazines  and 
training  material  is  required.  As  these  are  not  readily  available  on  computer-readable  media,  cheap 
scanning  devices  are  needed. 

5.  It  would  seem  that  each  developer  of  voice  synthesis  systems  still  does  so  in  a  quite  independant 
fashion.  Some  ISO-type  standards  should  be  developed  as  this  would  greatly  improve  the  cost  to  the 
consumer,  and  promote  portability  across  systems. 

6.  It  would  seem  that  the  major  developments  are  aimed  at  improving  the  quality  and  speed  of  the 
better  voice  systems.  In  general,  qualities  and  speed  are  not  as  immediate  a  priority  as  encouraging 
potential  users  to  take  the  first  step.  For  personal  use,  in  particular,  the  price  of  an  acceptable  system 
is  often  the  major  deterrent  to  this.  Once  the  price  is  right,  the  larger  number  of  users  should  help  to 
sustain  it  and,  indeed,  to  fire  further  improvements. 


For  the  blind,  especially,  computers  do  not  just  represent  employment  possibilities  or  toys,  but  vital 
tools  on  the  road  to  equality  in  almost  all  aspects  of  social  integration.  The  direction  and  effectiveness 
of  future  research  will  determine  the  efficiency  of  such  systems. 


46 


Summary  and  Recommendations 


Emerson  Foulke 

Perceptual  Alternatives  Laboratory,  USA 

1.  Standardised  Synthesizing  of  Pronunciations 

As  matters  presently  stand,  there  are  wide  differences  in  the  procedures  for  synthesizing  the 
pronunciation  of  words  in  different  languages.  A  unified  and  standardised  procedure  for 
accomplishing  this  task  would  make  it  much  easier  to  extend  the  benefits  of  synthesis  to  new  langauges. 
An  example  of  how  this  might  be  accomplished  is  offered  by  the  special  programming  language  written 
for  this  purpose  by  Granstrom  and  his  colleagues. 


2.  Languages  not  Suitable  for  Synthesis 

In  many  countries  where  non-European  languages  are  spoken,  and  in  some  European  countries  as 
well,  there  are  large  numbers  of  blind  persons  who  are  prevented  from  realising  the  benefits  of 
computers  because,  at  present,  the  language  they  speak  cannot  be  displayed  as  synthesized  speech.  In 
many  of  these  countries,  the  resources  required  for  the  solution  of  this  problem  are  not  available.  In 
order  to  extend  the  benefits  of  computer  use  to  these  blind  persons,  steps  should  be  taken  to  determine 
and  implement  algorithms  for  synthesizing  the  words  in  such  languages. 

3.  Quality  of  Synthesized  Speech 

Although  synthesized  speech  does  not  have  to  be  perfect  to  be  useful,  the  quality  of  the  listening 
experience  is  a  factor  that  deserves  high  priority.  Efforts  to  improve  the  synthesis  of  speech  should  be 
guided  by  tests  of  speech  intelligibility  which  are  capable  not  only  of  detecting  differences  among 
specimens  of  synthesised  speech,  but  also  of  yielding  diagnostic  information  and  prescriptive 
guidelines. 

4.  Maximising  Intelligibility  of  Synthesized  Speech 

The  intelligibility  of  synthesized  speech  and  recorded  natural  speech  should  be  maximised  for  as  many 
users  as  possible.  Accordingly,  there  should  be  an  investigation  of  the  possibility  that  there  may  be 
alterations  of  the  synthesized  signal  which  would  make  it  more  intelligible  to  persons  with  various 
hearing  defects,  without  reducing  its  intelligibility  for  users  with  normal  hearing.  This 
recommendation  is  perhaps  best  clarified  by  an  analogy. 

Dr  Joseph  Wiedel,  a  cartographer  at  the  University  of  Maryland,  makes  maps  that  are  to  be  observed 
by  touch.  The  elements  of  display  are  raised  above  the  surface  on  such  maps  so  that  they  can  be  felt. 
In  order  to  make  the  same  maps  useful  to  visually  impaired  persons  who  retain  some  colour  vision, 
map  features  are  also  differentiated  by  colour.  Furthermore,  colours  are  chosen  so  that  the 
differentiation  they  achieve  can  also  be  appreciated  by  persons  who  are  colour  blind.  Thus,  the  range 


47 


of  persons  who  can  use  these  maps  is  extended  without  diminishing  their  usefulness  to  the  users  for 
whom  they  were  primarily  intended. 

If  there  is  no  single  shaping  of  a  synthesized  or  natural  speech  utterance  that  will  make  it  more 
intelligable  to  a  larger  number  of  listeners,  it  may  still  be  feasible  to  prepare  different  renditions  of 
synthesized  or  natural  speech  utterances  for  different  categories  of  listeners,  eg  a  category  of  listeners 
with  hearing  losses  in  the  frequency  range  where  most  speech  information  is  to  be  found. 

Finally,  if  the  requirements  for  good  hearing  are  so  individualistic  that  useful  categories  cannot  be 
defined,  it  may  be  neccesary  to  provide  playback  machines  with  graphic  equalizers,  so  that  individual 
listeners  can  adjust  speech  signals  to  meet  individual  requirements.  The  treble  and  bass  controls  on 
many  speech  reproducers  now  in  use  have  been  designed  to  allow  the  listener  with  normal  hearing  to 
make  adjustments  that  will  impart  to  the  audio  signal  a  more  pleasing  quality,  but  these  adjustments 
are  not  adequate  for  a  listener  who  must  compensate  for  a  critical  hearing  loss. 

5.  Portable  Synthesizers 

Blind  computer  operators  need  a  very  small,  portable,  battery-operated  speech  synthesizer  which  can 
easily  be  moved  from  computer  to  computer,  so  that  they  can  be  accommodated  in  a  wide  variety  of 
work  settings.  A  hardware  component  consisting  only  of  a  formant  synthesizer,  together  with  a 
text-to-speech  program  in  software  would  be  one  way  to  achieve  this  flexibility. 

A  blind  computer  operator  could  prepare  a  computer  for  his  or  her  use  simply  by  connecting  the 
synthesizer  to  an  appropriate  port,  and  by  loading  the  text-to-speech  program.  A  synthesizer  of  this 
type  is  more  flexible  than  a  synthesizer  that  must  be  installed  by  plugging  a  card  in  a  slot  inside  the 
computer.  According  to  Dr  Steeneken,  the  Philips  MEA  8000  is  a  synthesizer  that  comes  close  to 
meeting  these  requirements. 

6.  Digital  Recording  for  Archives 

At  its  present  state  of  development,  digital  recording  technology  is  quite  suitable  for  recording  speech 
that  is  intended  for  archival  storage.  Because  industry  standards  have  not  yet  been  achieved,  the 
long-range  utility  of  any  given  embodiment  of  digital  recording  technology  is  uncertain.  As  a  result  of 
rapidly  developing  technology,  a  particular  embodiment  may  not  be  supported  by  its  maker  in  a  few 
years.  However,  in  the  case  of  archival  storage,  this  is  not  a  serious  problem.  The  permanence  and  the 
fidelity  of  the  record  that  is  achieved  by  digital  recording  strongly  recommend  it  for  use  in  preparing 
archival  recordings,  and,  if  obsolescence  of  equipment  appears  to  be  a  problem,  the  equipment  used 
to  prepare  the  recordings  can  be  placed  in  the  archives  as  well. 

7.  Digital  Recording  for  Talking  Books 

The  employment  of  digital  recording  techniques  to  prepare  the  books  and  articles  used  by  blind  people 
who  read  by  listening  is  a  very  attractive  possibility.  The  recordings  used  by  readers  would  have  high 
fidelity,  would  not  be  easily  damaged,  and  would  last  many  years.  Furthermore,  large  quantities  of  text 


48 


could  be  contained  in  small  packages.  However  the  uncertainty  engendered  by  a  technology  in  flux 
poses  a  serious  problem  for  this  application. 

In  order  to  implement  it,  a  large-scale  production  facility  would  be  required  to  produce  recorded 
reading  matter  in  the  quantity  required  by  a  national  readership.  In  addition,  readers  would  have  to 
be  provided  with  the  equipment  needed  for  the  reproduction  of  digital  recordings.  If  a  change  in 
technology  rendered  the  recording  and  the  reproducing  equipment  obsolete,  a  large  investment  in 
time,  effort  and  money  would  be  lost. 

Therefore,  although  the  digital  recording  of  the  books  and  articles  used  by  blind  persons  who  read  by 
listening  cannot  be  recommended  at  this  time,  industry  developments  should  be  monitored  closely,  so 
that  when  the  future  has  been  rendered  more  predictable  by  the  adoption  of  industry  standards,  it  will 
be  possible  to  realise  the  advantages  of  digital  recording,  without  delay. 

Any  organisation  that,  as  a  result  of  obsolescence  of  equipment  now  in  use,  is  confronted  with  the 
necessity  of  making  an  immediate  decision  concerning  the  way  in  which  recorded  books  and  articles 
should  be  prepared,  might  consider  conversion  to  the  4-track  15/16  ips  cassette  as  an  interim  solution. 
This  is  a  proven  technology  that  will  probably  enjoy  many  years  of  industry  support  before  it  yields  to 
more  advanced  technology. 


8.  Alteration  in  Production  of  Synthesized  Speech 

It  is  a  relatively  simple  matter  to  alter  production  of  synthesized  speech  in  a  number  of  ways.  To  cite 
a  familiar  example,  many  blind  computer  operators  listen  at  accelerated  word  rates  to  the  synthesized 
speech  displayed  by  their  computers.  When  listening  to  familiar  messages,  such  as  prompts,  some 
claim  that  they  can  understand  speech  at  word  rates  as  high  as  500  wpm.  Ordinarily,  acceleration  is 
accomplished  by  a  non-selective  process  in  which  the  production  time  of  any  segment  of  the  synthesized 
speech  signal  is  reduced  by  the  same  proportion  as  any  other  segment. 

However  it  would  not  be  difficult  to  operate  on  the  speech  signal  selectively,  shortening  some  segments 
considerably  and  others  hardly  at  all,  and  it  may  be  that  the  effect  of  shortening  on  intelligibility 
depends,  in  part,  on  the  segment  of  the  signal  that  is  to  be  shortened.  If  this  is  the  case,  then  it  may  be 
possible,  by  selective  treatment  of  the  synthesized  speech  signal,  to  reduce  the  loss  in  intelligibility  that 
results  when  word  rate  is  increased. 

This  possibility  should  be  investigated,  because  the  ability  to  comprehend  synthesized  speech  at 
significantly  accelerated  word  rates  would  allow  blind  operators  to  interact  with  the  computers  they 
use  at  a  pace  that  compares  more  favourably  with  the  pace  at  which  sighted  operators  interact  with  the 
computers  they  use.  In  addition,  synthesized  speech  that  remains  intelligible  at  accelerated  rates  would 
be  useful  to  those  blind  persons  who  read  electronic  text  by  listening  to  synthesized  speech. 


9.  Background  Noise 

One  of  the  factors  that  affects  the  intelligibility  of  speech  is  the  background  of  noise  against  which  it 
is  displayed.  Because  synthesized  speech  is,  at  best,  less  intelligible  than  natural  speech,  it  is  more 


49 


vulnerable  to  noise  than  natural  speech.  When  it  is  used  for  public  announcements  in  noisy  settings 
such  as  underground  stations,  noise  becomes  a  serious  problem. 

However,  because  of  the  ease  with  which  characteristics  of  the  synthesized  speech  signal  can  be 
selectively  altered,  it  is  feasible  to  search  for  a  combination  of  characteristics  that  minimizes  its 
vulnerability  to  noise.  The  investigation  should  be  undertaken  because  synthesized  speech  that 
remains  intelligible  under  adverse  conditions  will,  by  making  talking  signs  more  effective,  extend  the 
range  of  their  application. 

Possible  applications  include  the  status  announcements  displayed  by  automatic  cash  dispensers, 
checkout  counters,  ticket  vending  machines  and  food  vending  machines,  announcements  of  stops  on 
buses  and  trains,  and  announcements  that  identify  building  entrances,  street  intersections  and  so  forth. 


10.  Interaction  with  Computers 

Interaction  with  computers  such  as  the  Macintosh,  the  PS/2  with  OS2  and  the  Presentation  Manager, 
that  depends  on  the  interpretation  of  icons  and  the  use  of  a  mouse,  is  likely  to  become  the  dominant 
form  of  interaction  in  the  near  future.  Many  of  these  computers  and  the  operating  systems  they  use 
may  be  designed  in  such  a  way  that  no  other  form  of  interchange  is  feasible. 

This  development  poses  a  serious  threat  to  computer  operators  whose  interaction  depends  on  the  use 
of  a  text-to-speech  program  and  a  speech  synthesizer.  Consequently,  an  investigation  should  be 
undertaken  to  devise  a  solution  to  this  problem  before  it  becomes  serious. 


11.  Flexibility  between  Computers 

In  the  work  place  the  sighted  computer  operator  can,  if  necessary,  do  work  on  different  computers. 
For  instance,  he  or  she  may  find  it  convenient  to  construct  and  use  spread  sheets  on  a  machine  of  the 
PC  type,  and  to  move  to  a  Mackintosh  when  desk- top  publishing  is  required.  Blind  computer  operators 
do  not  enjoy  this  flexibility  because  the  text-to-speech  programs  and  speech  synthesizers  they  use 
cannot,  due  to  lack  of  standardization,  be  moved  freely  from  one  computer  to  another. 

The  specification  of  standards  that  would  make  text-to-speech  programs  and  speech  synthesizers  more 
universally  useful  is  a  desirable  objective,  because  the  observance  of  such  standards  would  increase 
flexibility  of  blind  computer  operators  in  meeting  the  demands  of  the  work  place. 

12.  Systems  to  Design  New  Equipment  and  Software 

Blind  computer  operators  cannot  interact  with  the  systems  now  used  for  the  design  of  new  equipment 
and  software.  Thus  they  are,  in  some  degree,  excluded  from  participation  in  the  design  of  equipment 
they  will  have  to  use.  If  steps  can  be  taken  that  will  allow  them  to  enter  more  effectively  into  the  process 
of  designing,  the  insights  they  are  uniquely  qualified  to  offer  can  make  a  valuable  contribution  to  the 
design  of  new  equipment. 


50 


13.  Legislation  for  Accessibility 

Legislation  should  be  considered  that  would  require  computer  manufacturers  to  guarantee 
accessibility  by  blind  computer  operators  to  the  computers  they  make. 


Other  Publications 


51 


P  L  Emiliani  (Ed):  Development  of  Electronic  Aids  for  the  Visually  Impaired,  Martinus  Nijhoff/ 

Dr  W  Junk  Publishers,  Dordrecht,  1986,  pag.  1-312. 

P  L  Emiliani:  Concerted  Research  Project  on  Rehabilitation  of  the  Visually  Impaired,  CEE  Workshop 
on  Technical  Aids  for  Communication,  Manipulation  and  Environmental  Control,  Paris, 

G  Cochrane,  C  Hamonet,  Eds.,  Fondazione  Pro  Iuventute,  Milano,  pag.  75-78. 

M  Truquet  (Ed.):  Proceedings  of  the  workshop  on  F  oduction  of  Hardcopy  Materials  for  the  Blind, 
Toulouse,  October  14-16,  1986,  in  press. 

P  L  Emiliani  (Ed.):  Proceedings  of  the  workshop  on  Communication  Systems  for  the  Blind,  Florence, 
November  18-20,  1986,  in  press. 

J  M  Gill  (Ed.):  Proceedings  of  the  workshop  on  Network  Terminals  for  the  Visually  Disabled, 
Datchett,  June  1987. 


